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EVOLUTION OF WHOLE CELLS AND ORGANISMS BY RECURSIVE 
SEQUENCE RECOMBINATION 

5 

CROSS-REFERENCE TO RELATED APPLICATION 

This application is a continuation-in part of 09/1 16, 1 88. The subject 

application claims priority to this prior application, which is also incorporated by reference in 
its entirety for all purposes. 

1 0 FIELD OF THE INVENTION 

The invention applies the technical field of molecular genetics to evolve the 

genomes of cells and organisms to acquire new and improved properties. 

BACKGROUND 

y/O 98/31837 (PCT/US98/00852) provides pioneering technology for evolving 

1 5 the genome of whole cells and organisms. One of skill will appreciate that the technology 
provided in WO 98/3 1837 is fundamental to the ability of one of skill rapidly to evolve cells 
and whole organisms. For example, the document teaches a variety of recursive methods of 
artificially recombining nucleic acids in vivo, including entire genomes, and ways of selecting 
resulting recombinant organisms. 

20 This ability to evolve genes artifically is of fundamental importance. For 

example, cells have a number of well-established uses in molecular biology, medicine and 
industrial processes. For example, cells are commonly used as hosts for manipulating DNAin 
processes such as transformation and recombination. Cells are used for expression of 
recombinant proteins encoded by DNA transformed/transfected or otherwise introduced into 

25 the cells. Some types of cells are used as progenitors for generation of transgenic animals and 
plants. Although all of these processes are now routine, prior to the technology provided by 
WO 98/3 1837, the genomes of the cells used in these processes had evolved little from the 
genomes of natural cells, and particularly not toward acquisition of new or improved 
properties for use in the above processes. 

30 Additional methods of recursively recombining nucleic acids in vivo and 

selecting resulting recombinants would be of use. The present invention provides a number of 
new and valuable methods and compositions for whole and partial genome evolution. 

1 
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SUMMARY OF THE INVENTION 

. , . '\ . In one aspect, the.invention provides methods of evolving a ceU to acquire a 
• .desired, function... Such methods. entail, e. g :,- introducing a library of DNA fragments into a 
plurality of cells, whereby.at least one of the fragments undergoes recombination with a 
5 segment in the genome or an episome of the cells to produce modified cells. Optionally, these 
modified cells are bred to increase the diversity of the resulting recombined ceUular 
population. The modified cells, or the recombined cellular population are then screened for 
modified or recombined cells that have evolved toward acquisition of the desired function. 
PNA from the modified cells that have evolved toward the desired function is then optionally 
10 recombined with a further library of DNA fragments, at least one of which undergoes 

recombination with a segment in the genome or the episome of the modified cells to produce 
further modified cells. The furmer modified. cells are men screened fo^ 
that have further evolved toward acquisition of the desired function. Steps of recombination 
and screening/selection are repeated as required until the further modified cells have acquired 
15 the desired function. In one preferred embodiment, modified cells are recursively recombined 
to increase diversity of the cells prior to performing any selection steps on any resulting ceUs. 

In some methods, the library or further library of DNA fragments is coated 
. with recA protein to stimulate recombination, with the segment .of the genome. The library of 
fragments is optionally denatured to produce single-stranded DNA, which are annealed to 

Duplexes containing mismatches are optionally selected by affinity chromatography to 
immobilized MutS. 

Optionally, the desired function is secretion of a protein, and the plurality of 
cells further comprises a construct encoding the protein. The protein is optionally inactive 
25 unless secreted, and further modified cells are pptionaUy selected for protein functioa 
Optionally, the protein is toxic to the plurality of cells, unless secreted. In this case, the 
modified or further modified cells which evolve toward acquisition of the desired function are 
screened by propagating the cells and recovering surviving cells. 

In some methods, the desired function is enhanced recombination. In such 
30 methods, the library of, fragments sometimes comprises a cluster of genes collectively 

coiiferring recombinati 9 n capacity. Screening can be.achieyed using cells carrying a gene 
.encoding a inaiker whose expre^on bprevented,by a mutittion removable by.reoombination. 
The cells are screened by their expression of the marker resulting from removal of the 
mutation by recombination. 
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In some methods, the plurality of ceils are plant cells and the desired property h 
improved resistance to a chemical or microbe. The modified or further modified cells (or 
whole plants) are exposed to the chemical or microbe and modified or further modified cells 
having evolved toward the acquisition of the desired function are selected by their capacity to 
5 survive the exposure. 

In some methods, the plurality of cells are embryonic cells of an animal, and the 
method further comprises propagating the transformed cells to transgenic animals. 

The plurality of cells can be a plurality of industrial microorganisms that are 
enriched for microorganisms which are tolerant to desired process conditions (heat, light, 
10 radiation, selected pH, presence of detergents or other denaturants, presence of alcohols or 
other organic molecules, etc.). 

The invention further provides methods for performing in v/vo recombination. 
At least first and second segments from at least one gene are introduced into a cell, the 
segments differing from each other in at least two nucleotides, whereby the segments 
15 recombine to produce a library of chimeric genes. A chimeric gene is selected from the library 
having acquired a desired function. 

The invention further provides methods of predicting efficacy of a drug in \- 
treating a viral infection. Such methods entail recombining a nucleic acid segment from a -ft 
virus, whose infection is inhibited by a drug, with at least a second nucleic acid segment from 
20 the virus, the second nucleic acid segment differing from the first nucleic acid segment in at. 
least two nucleotides, to produce a library of recombinant nucleic acid segments. Host cells 
are then contacted with a collection of viruses having genomes including the recombinant 
nucleic acid segments in a media containing the drug, and progeny viruses resulting from 
infection of the host cells are collected. 
25 A recombinant DNA segment from a first progeny virus recombines with at 

least a recombinant DNA segment from a second progeny virus to produce a further library of 
recombinant nucleic acid segments. Host cells are contacted with a collection of viruses 
having genomes including the further library or recombinant nucleic acid segments, in media 
containing the drug, and further progeny viruses are produced by the host cells. The 
30 recombination and selection steps are repeated, as desired, until a further progeny virus has 
acquired a desired degree of resistance to the drug, whereby the degree of resistance acquired 
and the number of repetitions needed to acquire it provide a measure of the efficacy of the 
drug in treating the virus. Viruses are optionally adapted to grow on particular cell lines. 
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The mvenUonfimher provides nethwls of predicting efficacy of a<W • 
infedon by a pathogenic n^org™!*, T**JL^££' 

re^ntbnM.onwM, Mgments ta tte genome modified 
~ "od^cr^^^ 

nucroorgaoisms are recovered. DNA from survivmg rmcrc^rganisj is 

ofDNA fragments aHeas, some of vhich undergo ' 
-~~m,he^^ 

<tag, and taker sun™* rmcroorgam-ans are co„e«ed. The recombination and 

pathogenic microorganism. : 8 1Jlng the 

The invention further provides methods of evolvinc a cellrn,™ - ' • 

genomes. The ceUs are ten screened or selected for cells <ha, have evo.ved .oward 

ac^on of , desired proper^. ^ DNA exchange and screening/se.ecUng steps are - 

e 1 06115 m ™ e n «xt cycle, until a cell has 



25 



30 



—~vuo uuw one cycle tormina 

d°W c* in the next cycle, until a ceU has acquired «he desired property 

. Mechanisnis of DNA exchange include conjugation, phage-mediated 
— p Uposome dehveo,, protoplast fusion, and sextuu recombmauon of the cells 
Op-onaUy, a hhrary of DNA fragme*s can he transformed Or electroporated into the cells 
eff-^t. ^ n ° ,ed ' «"» ofevolvtag a cell to acquite a desired ptopettv are 

effected by protopta-rnediaed exchange of DNA between cells. Such metitol en« 

77 ° f 8 P0PU,Mi0n ° f ^ ~" - «. form 

t! hT " Wl,iCh iM " i " 4 ' P-»P'— «ne to form hybtid genol 

~* •* - "e recombined one ^^^ tn ^.„^ 
^ — - * -> - <— *e dive^ty of any resuLg eel 

™ r~ r * ***' • » ***** .o 

generate a diverse population of cells. - -, 
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The next step is to select or screen to isolate regenerated cells that have 
evolved toward acquisition of the desired property. DNA exchange and selection/screening 
steps are repeated, as needed, with regenerated cells in one cycle being used to form 
protoplasts in the next cycle until the regenerated cells have acquired the desired property. 
5 Industrial microorganisms are a preferred class of organisms for conducting the above 
methods. Some methods further comprise a step of selecting or screening for fused 
protoplasts free from unfiised protoplasts of parental cells. Some methods further comprise a 
step of selecting or screening for fused protoplasts with hybrid genomes free from cells with 
parental genomes. In some methods, protoplasts are provided by treating individual cells, 
10 mycelia or spores with an enzyme that degrades cell walls. In some methods, the strain is a 
mutant that is lacking capacity for intact cell wall synthesis, and protoplasts form 
spontaneously. In some methods, protoplasts are formed by treating growing cells with an 
inhibitor of cell wall formation to generate protoplasts. 

In some methods, the desired property is expression and/or secretion of a 
15 protein or secondary metabolite, such as an industrial enzyme, a therapeutic protein^ a primary 
metabolite such as lactic acid or ethanol, or a secondary metabolite such as erythromycin 
cyclosporin A or taxol. In other methods it is the ability of the cell to convert compounds 
provided to the cell to different compounds. In yet other methods, the desired property is 
capacity for meiosis. In some methods, the desired property is compatibility to form a , 
20 heterokaryon with another strain. 

The invention further provides methods of evolving a cell toward acquisition of 
a desired property. These methods entail providing a population of different cells. DNA is 
isolated from a first subpopulation of the different cells and encapsulated in liposomes. 
Protoplasts are formed from a second subpopulation of the different cells. Liposomes are 
25 fused with the protoplasts, whereby DNA from the liposomes is taken up by the protoplasts 
and recombines with the genomes of the protoplasts. The protoplasts are incubated under 
regenerating conditions. Regenerating or regenerated cells are then selected or screened for 
evolution toward the desired property. 

The invention further provides methods of evolving a cell toward acquisition of 
30 a desired property using artificial chromosomes. Such methods entail introducing a DNA 

fragment library cloned into an artificial chromosome into a population of ceils. The cells are 
then cultured under conditions whereby sexual recombination occurs between the cells, and 
DNA fragments cloned into the artificial chromosome recombines by homologous 

5 
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recombu,.^ w«h corresponding segmen* of ^^s^^^^. 
' ^^^ endogenous chromosbriiesrecombihe'v^ 

^meoviaconjnganon ^^ MbTO ^ WWmA 

5 ^^cells jn^ca^atogene^gadivers.,^ ofce lis theceUsuJn 
evolved toward action of tte desired ^ ^ ^ ^ 

^^^^^^^^^^^ 

1U selection or screening steps. 

^"^fi 1 ^^ " ^''aiy of variarits'of the segment,' each variant cloned into separate copie^of an 
. «*nc.a, chromosome. The copies ofthea^cia.ch^.o^e a. indeed I 
.» poputaonofcelis . n.*,.*,^,^ ^ 

2" - — -d homologous recombination occurs be^een copies ofm e all, 

acquisition of the desired property. - Vi 

. 7{e „ . - ^ further proves hv^c. recombinant rccA protein. . 

The method also provides meftods of iterative pooling and breedtag of 
h^erorgantsms. WtameU.ods.ah^raryofdiversemui.ice.^orgamsmsareprodnced 
(eggplants, aumals or the like). ^P^'ofmale gametes is provided along with a pool of 
25 ^ ■* P °°' " «- -Prises a P Iurau ; of 

m-m ate used ,o ferriage the female gamete,. At leas, a porrion of tne resuWng fertilized 

^IT^^^^™ "-^cuvcvviableorgamsms 
^crossed (e.g., by patrwrse pooling and joining of the ma!e and female gametes as before) to 

30 ^Z' lbr ^° fd ™^ •^'^-."ense.eoWfcradcsired.aitor 

The Ubrary of diverse organisms can compS^p,,,^^ plantssucIlas 
^-neae Fttu ideM; Poacoidea6> ^ ^ ^ 

Or,™, Tnttcm, Secale, Avena, Hordeun, Saccharuni, Poa, Fesutca, Stenotaptaum, 
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Cynodon, Coix, Olyreae, Phareae, Compositae or Leguminosae. For example, the plants can 
be e.g., corn, rice, wheat, rye, oats, barley, pea, beans, lentil, peanut, yam bean, cowpeas, 
velvet beans, soybean, clover, alfalfa, lupine, vetch, lotus, sweet clover, wisteria, sweetpea, 
sorghum, millet, sunflower, canola or the like. 
5 Similarly, the library of diverse organisms can incldue a plurality of animals 

such as non-human mammals, fish, insects, or the like. 

Optionally, a plurality of selected library members can be crossed by pooling 
gametes from the selected members and repeatedly crossing any resulting additional 
reproductively viable organisms to produce a second library of diverse organisms (e.g., by split 
10 pairwise pooling and rejoining of the male and female gametes). Here again, the second 
library can be selected for a desired trait or property, with the resulting selected members 
forming the basis for additional pool wise breeding and selection. 

A feature of the invention is the libraries made by these (or any preceding) 
method. , - 

15 BRIEF DESCRIPTION OF THE DRAWING 

Fig. 1, panels A-D: Scheme for in vitro shuffling of genes. 

Fig. 2: Scheme for enriching for mismatched sequences using MutS. 
Fig. 3: Alternative scheme for enriching for mismatched sequences using 

MutS. 

20 Fig; 4: Scheme for evolving growth hormone genes to produce larger fish. 

Fig. 5: Scheme for shuffling prokaryotes by protoplast fusion. 
Fig. 6: Scheme for introducing a sexual cycle into fungi previously incapable of 
sexual reproduction. 

Fig. 7: General scheme for shuffling of fungi by jprotoplast fusion. 
25 Fig. 8: Shuffling fungi by protoplast fusion with protoplasts generated by use 

of inhibitors of enzymes responsible for cell wall formation. 

Fig. 9: Shuffling fungi by protoplast fusion using fungal strains deficient in 
cell-wall synthesis that spontaneously form protoplasts. 

Fig. 10: YAC-mediated whole genome shuffling of Saccharomyces cerevisiae 
30 and related organisms. 

Fig. 11: YAC-mediated shuffling of large DNA fragments. 
Fig. 12: (A, B, C and D) DNA sequences of a wildtype recA protein and five 
hyperrecombinogenic variants thereof 
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F,g - 13 An ^ n0 add fences of a wildtype recA protein and five 
nyperrecombinogenic variants thereof. 

Fig. 14: iUustrationof<x)mbiriatoriaIity. 

5 Fi8l5:ReP ^ tedp ^ Ser ^ b ^^ 

strategies. % ' 16 -^^ 

Rg 1?: ^P 118 of asexual sequential mutagenesis and sexual recursive 
recombinatioa ■■ ■ «"»ivc 

Fig: 1 8: Schematic for non-homologous recombination. 
Fig. 19: Schematic for split and pool strategy. 

Fig. 20, panel A. schematic for selectable/ countersele*^ marker strategy 
,S; 20 ' Pand * SChematic for SelectabI * cou„tersel<^, e markerstrategy for 

Fig. 21: plant regeneration strategy for regenerating salt-tolerant plants 
Fig. 22: Whole genome shuffling of parsed (subcloned) genomes 
Fig 23 . Schematic for blind cloning of gene homologs. 
Fig. 24: High throughput family shuffling. 
Fig. 25: Schematic and graph of poolwise recombination 
Fig. 26. Schematic of protoplast fusion. 
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RecA. 

15 



Fig. 28: Schematic of halo assay and integrated system. 
Fig. 29: Schematic drawing illustrating recursive pooled breeding of fish 

F,g ' 30: SchemaUc dr3win 8 i^trating recursive pooled breeding of plants 
Fig. 31: Schematic for shuffling of £ Colicolor. 

25 • • Fig. 32: schematic drawing ^ustratrng HTP actinorohodin assay. 

Fig. 33: schematic drawing and table illustrating whole genome shuffling of 
four parental strains. * 

Fig. 34: schematic drawing of WGS through organized heterbduplex shuffling. 
DETAILED DESCRIPTION > 

30 1. GENFRAT 

A. THE BASIC APPRHArH 

The invention provides methods for artificially evolving cells to acquire a new 
or unproved property by recursive sequence recombination. Briefly, recursive sequence 
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recombination entails successive cycles of recombination to generate molecular diversity and 
screening/selection to take advantage of that molecular diversity. That is, a family of nucleic 
acid molecules is created showing substantial sequence and/or structural identity but differing 
as to the presence of mutations. These sequences are then recombined in any of the described 
5 formats so as to optimize the diversity of mutant combinations represented in the resulting 
recombined library. Typically, any resulting recombinant nucleic acids or genomes are 
recursively recombined for one or more cycles of recombination to increase the diversity of 
resulting products. After this recursive recombination procedure, the final, resulting products 
are screened and/or selected for a desired trait or property. 

10 Alternatively, each recombination cycle can followed by at least one cycle of 

screening or selection for molecules having a desired characteristic. In this embodiment, the 
molecule(s) selected in one round form the starting materials for generating diversity in the 
next round. . .> 

The cells to be evolved can be bacteria, archaebacteria, or eukaryotic cells and 

15 can constitute a homogeneous cell line or mixed culture. Suitable cells for evolution include 
the bacterial and eukaryotic cell lines commonly used in genetic engineering, protein 
expression, or the industrial production or conversion of proteins, enzymes, primary 
metabolites, secondary metabolites, fine, specialty or commodity chemicals. Suitable 
mammalian cells include those from, e.g., mouse, rat, hamster, primate, and human, both cell 

20 lines and primary cultures. Such cells include stem cells, including embryonic stem cells and 
hemopoietic stem cells, zygotes, fibroblasts, lymphocytes, Chinese hamster ovary (CHO), 
mouse fibroblasts (NIH3T3), kidney, liver, muscle, and skin cells. Other eukaryotic cells of 
interest include plant cells, such as maize, rice, wheat, cotton, soybean, sugarcane, tobacco, 
and arabidopsis; fish, algae, fungi (penicillium, aspergillus, podospora, neurospora, 

25 saccharomyces), insect (e.g., baculo lepidoptera), yeast (picchia and saccharomyces, 

Schizosaccharomyces pombe). Also of interest are many bacterial cell types, both gram- 
negative and gram-positive, such as Bacillus subtilis, B. licehniformis, B. cereus, Escherichia 
coli f Streptomyces, Pseudomonas, Salmonella, Actinomycetes y Lactobacillus, 
Acetonitcbacter, Deinococcus, and Erwinia. The complete genome sequences of E. coli and 

30 Bacillus subtilis are described by Blattner et al., Science 277, 1454-1462 (1997); Kunst et al., 
Nature 390, 249-256 (1997)). 

Evolution commences by generating a population of variant cells. Typically, 
the cells in the population are of the same type but represent variants of a progenitor cell. In 
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some instances, the variation is natural as when different cells are obtained 'irom different 

*ora different species or from different genera ta other 
^ces.v.riafenisWucea^^^ . 

effected by subjecd^ *. ceU to mutagenic agents, or if the cell is a mtnator cell (e , has 
mutations in genes involved in PNA replication, recombination and/or repair which favor 
muoducaon of motions) simply by propagating the mutator cells. Mutator cells can be 
generated from successive selections for staple phenotypic changes (e:g , 
rmunp.cm-resis.ance, then r^dixic acid resisance then lac- to lac* (see Mao « al v 

179,*41T422 (.997)), <*«~.* - W V^*^ 
«*ub.,ors of ceAdar factors that result in the mutator pneno^e. TTese could be inhibitors of 
•miS, mutL, maD, recD, muK.mutM, dam, uvrD and the like. 



More generally, mutations are induced in cell r^pulatioWusing any available 
^ontechniaue: C^^^^^^^^^ 

■r^onsh, „„,s, ™/r, ^ and ^ „ c^^^f 
us, ,of mhtbuors of MMR, DNA damage inducible genes ,or SOS ^overproduction/ 
underproduction/ mu«a.ion of any component of the homologous recombination 
complex/pathway, e.g: RecA, ssb, etc.; overproduction/ underproduction/ mutation of genes 
. . » N ^^°~.~^ovcrproducuo,. U nderproduaio., mM a, i onof 

organisms; addition of chi sites into/flanking the donor DNA fragments; coating the DNA 
fragments with RecA/ssb and the like. 

in other instances, variation is the result of transferring a library of DNA 
fragments into the cells (e:g., by conjugation, protoplast fosion, Uposome fosioh, 
25 ttansformation, transduction or natural competence). At least one/and usually n^y c f the 
fragments in the library, show some, but not complete, sequence or stmctural iden^^ 
cognate or allelic gene within the cells sufficient to allow homologous recombination to occur 
For example,in one embodiment, homologous integration of a plasmid carrying a shuffled ' 
gene or metabolic pathway leads to insertion of the plasmid-bome sequences adjacent to the 
genorruc copy. Optionally ^ a c^unterlselectable marker Strategy is used to select for ; 
remnants in which recombination occurred between the ho m olo g ous sequences, leading to 
ehminaaonof the counter-select^^ a 

of selectable and counter selectable markers are amply illustrated in the art: For a list of useful 
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markers, see, Berg and Berg (1996), Transposable element tools for microbial p enetins 
Escherichia coli and Salmonella Neidhardt. Washington, D.C., ASM Press. 2: 2588-2612; La > 
Rossa, ibid., 2527-2587. This strategy can be recursively repeated to maximize sequence 
diversity of targeted genes prior to screening/ selection for a desired trait or property. 
5 The library of fragments can derive from one or more sources. One source of 

fragments is a genomic library of fragments from a different species, cell type, organism or 
individual from the cells being transfected. In this situation, many of the fragments in the 
library have a cognate or allelic gene in the cells being transformed but differ from that gene 
due to the presence of naturally occurring species variation, polymorphisms, mutations, and 

10 the presence of multiple copies of some homologous genes in the genome. Alternatively, the 
library can be derived from DNA from the same cell type as is being transformed after that 
DNA has been subject to induced mutation, by conventional methods, such as radiation, error- 
prone PCR, growth in a mutator organism, transposon mutagenesis, or cassette mutagenesis. 
Alternatively, the library can derive from a genomic library of fragments generated from the 

15 pooled genomic DNA of a population of cells having the desired characteristics. Alternatively, 
the library can derive from a genomic library of fragments generated from the pooled genomic 
DNA of a population of cells having desired characteristics. 

In any of these situations, the genomic library can be a complete genomic 
library or subgenomic library deriving, for example, from a selected chromosome, or part of a 

20 chromosome or an episomal element within a cell. As well as, or instead of these sources^of 
DNA fragments, the library can contain fragments representing natural or selected variants of 
selected genes of known function (i.e., focused libraries). 

The number of fragments in a library can vary from a single fragment to about 
10 10 , with libraries having from 10 3 to 10 8 fragments being common. The fragments should be 

25 sufficiently long that they can undergo homologous recombination and sufficiently short that 
they can be introduced into a cell, and if necessary, manipulated before introduction. 
Fragment sizes can range from about 10 b to about 20mb. Fragments can be double- or 
single-stranded. 

The fragments can be introduced into cells as whole genomes or as components 
30 of viruses, plasmids, YACS, HACs or BACs or can be introduced as they are, in which case 
all or most of the fragments lack an origin of replication. . Use of viral fragments with single- 
stranded genomes offer the advantage of delivering fragments in single stranded form, which 
promotes recombination. The fragments can also be joined to a selective marker before 
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longer period of time after introduction into the cell in which fragments can undergo 
^mationwithac^ 

capable ofpermanent retention in the cellUne. Such a vector can transiently express . marker 
for a suffiaent time to screen for or select a cell bearing the vector (e.g., because cells 
t^uced by the vector - the target ceU type to be scr^ 
. ■ but 1S then degraded or otherwise rendered mc^able of expressing the marker. Theuseof ' 
such vectors can be advantageous in performing optional subsequent rounds of recombination 

trrr'^ fot 

alone wdl not allow vector to be established. Jense & Gerdes, W A^cro^/ 17 205 210 
(1995); Bernard etal.,^, 62, 159-160. Alterative*, a vector can be .endered suicidal by 
mcorporauon of a defective origin of replication (e.g. a temperature-sensitive origin of 
rephcafon) or by omission of an origin of replication; Vectors can also be rendered suicidal 
by mclusion of negative selection markers, such as ura3 in yeast or sacB in many bacteria 
These genes become toxic only in the presence of specific compound, Such vectors can be 

0-,wh,ch can bWse^q^r^^ 

Berg (1996), 'Transposable element tools for microbial genetics" Escherichia coli and 
S^nelia Neidhardt Washington, DC, ASM Press. 2: 2588-2612. Similarly a list of 
counterselectable markers, generally applicable to vector selection is also found in Berg and 
Berg, See also, LaRossa (1996) "Mutant seie*ions linking physiology, inhibitors, and 

' g^tyPes- Escherichia c^li and Salm onella F. C. Neidhardt. Washington, D C ASM Press 
2:2527-2587. • • - • 

" After introduction into cells, the fragments can recombine with DNA present in 
the genome, or episomes of the cells by homologous, nonhomologous or site-specific 
recombmat.on. For present purposes, homologous recombination makes the most significant 
contnbution to evolution of the cells because this form of recombination amplifies the existing 
diversity between the DN A of the cells being transfected and the DNA fragments For 
eXampIeJfa ° NA fr8gment ^transfected differs from a cognate or allelic gene at two 
posmons, there are four possible recombination products, and each of these recombination 
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products can be formed in different cells in the transformed population. Thus, homologous 
recombination of the fragment doubles the initial diversity in this gene. When many fragments 
recombine with corresponding cognate or allelic genes, the diversity of recombination 
products with respect to starting products increases exponentially with the number of 
5 mutations. Recombination results in modified cells having modified genomes and/or episomes. 
Recursive recombination prior to selection further increases diversity of resulting modified 
cells. 

The variant cells, whether the result of natural variation, mutagenesis, or 
recombination are screened or selected to identify a subset of cells that have evolved toward 

10 acquisition of a new or improved property. The nature of the screen, of course, depends on 
the property and several examples will be discussed below. Typically, recombination is 
repeated before initial screening. Optionally, however, the screening can also be repeated 
before performing subsequent cycles of recombination. Stringency can be increased in 
repeated cycles of screening. 

15 The subpopulation of cells surviving screening are optionally subjected to a 

further round of recombination. In some instances, the further round of recombination is 
effected by propagating the cells under conditions allowing exchange of DNA between cells. 
For example, protoplasts can be formed from the cells, allowed to fuse, and regenerated. 
Cells with recombinant genomes are propagated from the fused protoplasts. Alternatively, 

V,' 

20 exchange of DNA can be promoted by propagation of cells or protoplasts in an electric field. 
For cells having a conjugative transfer apparatus, exchange of DNA can be promoted simply 
by propagating the cells. 

In other methods, the further round of recombination is performed by a split 
and pool approach. That is, the surviving cells are divided into two pools. DNA is isolated 

25 from one pool, and if necessary amplified, and then transformed into the other pool. 

Accordingly, DNA fragments from the first pool constitute a further library of fragments and 
recombine with cognate fragments in the second pool resulting in further diversity. An 
example of this strategy is illustrated in Fig. 19. As shown, a pool of mutant bacteria with 
improvements in a desired phenotype is obtained and split. Genes are obtained from one half, 

30 e.g., by PCR, by cloning of random genomic fragments, by infection with a transducing phage 
and harvesting transducing particles, or by the introduction of an origin of transfer (OriT) 
randomly into the relevant chromosome to create a donor population of cells capable of 
transferring random fragments by conjugation to an acceptor population. These genes are 
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t? ShUffled ^ ^ metl,0dS ° f "™ aS ^ herein), or simply cloned Mto „ 

allele replacement vector (e.gVone carrying selectable and counter^electabie markers) The 
gene pool is then transformed into the other Half of * e original mutant pool and recombin^ 
are selected and screened for further improvements in phenotype. These best variants are used 
as the startmg point for the next cycle. Alternatively, recursive recombination by any of the 
methods noted can be performed prior to screening, thereby increasing the diversity of the 
population of cells to be screened. 

In other methods, some or all of the cells surviving screening are transacted 
wrth a fresh library of DNA fragments, which can be the same or different from the library 
used m the first round of recombination. In this situation, the genes in the fresh library 
undergo recombination with cognate genes in the surviving cells. If genes are introduced as 
components of a vec^ 

of transfection should be considered. If the v^ 

vector, there is no problem of incompatibUity. If, however, the vector used in a previous 
' 15 r ° Und ^ n<>t a sui ^ vectdr, a vector having a different incompatibility origin should be 
used m the subsequent round. In all of these formats, further recombination generates 
addmonal diversity in the DNA component of the cells resulting in further modified cells 
. - The further modified cells are subjected to another round of 'screening/selection 

• according to the same principles as the first round. Screening/selection identifies a ^ - ■ 

property. This subpopulation of cells can be subjected to further rounds of recombination and 
screenmg according to the same principles, optionally with the stringency of screening being 
increased at each round/Eventually, cells are identified that have acquired the desired 
property. . - . 

25 A. DEFTNTTTfYNTg 

' The term cognate refers to a gene sequence that is evolutionarily and 
functionally related between species. For example, in the human genome, the human CD4 
. gene ,s the cognate gene to the mouse CD4 gene, since the sequences and structures of these 
two genes indicate that they are homologous and that both genes encode a protein which 
foncuons in signaling T-cell activation through MHC class II-restricted antigen recognition. 

. . ;i . Screening is,. in^general, a two-step process in which one first determines which 
cells.do and do not express.a screening marker or phenotype (or a selected level of marker or 
Phenotype), and then physically separates the cells having the desired property Selection is a 
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form of screening in which identification and physical separation are achieved simultaneously 
by expression of a selection marker, which, in some genetic circumstances, allows cells 
expressing the marker to survive while other cells die (or vice versa). Screening markers 
include luciferase, 0-galactosidase, and green fluorescent protein. Selection markers include 
5 drug and toxin resistance genes. 

An exogenous DNA segment is one foreign (or heterologous) to the cell or 
homologous to the cell but in a position within the host cell nucleic acid in which the element 
is not ordinarily found. Exogenous DNA segments can be expressed to yield exogenous 
polypeptides. 

10 The term "genesis used broadly to refer to, any segment of DNA associated 

with a biological function. Thus, genes include coding sequences and/or the regulatory 
sequences required for their expression. Genes also include nonexpressed DNA segments 
that, for example, form recognition sequences for other proteins. . 

The terms "identical" or "percent identity," in the context of two or more 

1 5 nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that 
are the same or have a specified percentage of amino acid residues or nucleotides that are the 
same, when compared and aligned for maximum correspondence, as measured using one of 
the following sequence comparison algorithms or by visual inspection. 

The phrase "substantially identical," in the context of two nucleic acids or 

20 polypeptides, refers to two or more sequences or subsequences that have at least 60%, 
preferably 80%, most preferably 90-95% nucleotide or amino acid residue identity, when 
compared and aligned for maximum correspondence, as measured using one of the following 
sequence comparison algorithms or by visual inspection. Preferably, the substantial identity 
exists over a region of the sequences that is at least about 50 residues in length, more 

25 preferably over a region of at least about 100 residues, and most preferably the sequences are 
substantially identical over at least about 150 residues. In a most preferred embodiment, the 
sequences are substantially identical over the entire length of the coding regions. 

For sequence comparison, typically one sequence acts as a reference sequence, 
to which test sequences are compared. When using a sequence comparison algorithm, test and 

30 reference sequences are input into a computer, subsequence coordinates are designated, if 
necessary, and sequence algorithm program parameters are designated. The sequence 
comparison algorithm then calculates the percent sequence identity for the test sequences) 
relative to the reference sequence, based on the designated program parameters. f 
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Optimal alignment of sequences for comparison can conducted by the 
local homology algorithm of Smith & Waterman, AcKAppl. Math; 2 :4V (1981) by ^ 
homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48 443 (I97 0 ) b th 
search for similarity method of Pearson Proc: tot'l. Acad. Sci. USA 85:24^' " 
5 (1988), by computerized implementations of algorithms GAP, BESTFTT, FASTA, and 
TFASTA in the Wisconsin Genetics Software Package Release 7 0, Genetics Computer 
Group, 575 Science Dr., Madison, WI 

Another example of a useful alignment algorithm is PILEUP PILEUPcreat 

10 ahgnments to show relationship and percent sequence identity. It also plots a tree or 

dendogram showing the clustering relationships used to create the alighmeht. PILEUP uses a 
simplification of the progressive alignment method of Feng & Doolittle, J. Mol. Evol. 3535 1 
360 (1987) The method used is similar to the method described by Higgins & Sharp 

length of 5,000 nucleotides or amino acid, The multiple ahgmnent procedure b^ins with the 
pa.rw.se alignment of the two most similar sequences, producing a cluster of two aligned 
sequences. This cluster is then aligned to the next most related sequence or cluster of aligned 
sequence, Two clusters of sequences are aligned by a si mp le extension of the pairwise 
ahgnment of two individual s equences. The final al ignment is achieved by a series of 

then- ammo acid or nucleotide coordinates for regions of sequence comparison and by 
debating the program parameters. For example, a reference sequence can be compared to 
other test sequences to determine the percent sequence identity relationship Using the 
following parameters: default gap weight (3.00), default gap length weight (010), and 
25 weighted end gaps. 

Another example of algorithm that is suitable for determining percent sequence 
.dentrtyand sequence similarity is the BLAST algorithm, which is described in Altschul et al 
J. Mol. Biol. 215:403-410 (1990). Software for performing BLAST analyses is publicly 
avadable through the National Center for Biotechnology Information 
JO (http://www.ncbi.nlm.nih.gov/)..- This algorithm involves first identify^ high scoring 

sequence pairs (HSPs) by identifying short words of length W in the query sequence which 
erther match or satisfy some positive-valued threshold score T when aligned with a word of 
the same length in a database sequence. T is referred to as the neighborhood word score - 
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threshold (Altschul et al, supra). These initial neighborhood word hits act as seeds for 
initiating searches to find longer HSPs containing them. The word hits are then extended in 
both directions along each sequence for as far as the cumulative alignment score can be 
increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters 
5 M (reward score for a pair of matching residues; always > 0) and N (penalty score for 
mismatching residues; always < 0). For amino acid sequences, a scoring matrix is used to 
calculate the cumulative score. Extension of the word hits in each direction are halted when: 
the cumulative alignment score falls off by the quantity X from its maximum achieved value; 
the cumulative score goes to zero or below, due to the accumulation of one or more negative- 
10 scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm 
parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN 
program (for nucleotide sequences) uses as defaults a wordlength (W) of 1 1, an expectation 
(E) of 10, M=5, N=-4, and a comparison of both strands. For amino acid sequences, the 
BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the 
15 BLOSUM62 scoring matrix (see Henikoff & Henikofl; Proc. Natl. Acad ScL USA 89: 10915 « 
(1989)). Z 
In addition to calculating percent sequence identity, the BLAST algorithm also , ^ 
performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin & 
Altschul, Proa Natl. Acad ScL USA 90:5873-5787 (1993)). One measure of similarity % 
20 provided by the BLAST algorithm is the smallest sum probability (P(N))> which provides an 

indication of the probability by which a match between two nucleotide or amino acid \ T v 

sequences would occur by chance. For example, a nucleic acid is considered similar to a 
reference sequence if the smallest sum probability in a comparison of the test nucleic acid to 
the reference nucleic acid is less than about 0. 1, more preferably less than about 0.01, and 
25 most preferably less than about 0.001. 

A further indication that two nucleic acid sequences or polypeptides are 
substantially identical is that the polypeptide encoded by the first nucleic acid is 
immunologically cross reactive with the polypeptide encoded by the second nucleic acid, as 
described below. Thus, a polypeptide is typically substantially identical to a second 
30 polypeptide, for example, where the two peptides differ only by conservative substitutions. 
Another indication that two nucleic acid sequences are substantially identical is that the two 
molecules hybridize to each other under stringent conditions. 
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The term ^atundly^ccuning" is used to describe an object that can be found 
m nature. For example, a polypeptide or pol^uc^ 

organrsm (including viruses) that can be i^^V^fa^^^^ 
been intentionally modified by man in the laboratory^ 
5 term naturaUy^ccuning refers to an object as pr^^ 
mdividual, such as would be typical for the species. 

Asexual recombination is recombination occurring without the fusion of 

gametes to form a zygote. 

ln : ^ : " repair deficient strain" can include any WanW in any organism 

10 .mpa.edmthefunctionsofmismatchrepair. T^-fa^-^^^^^^^^ ■ " 

™utT,mutH, mu tL,ovrD,dcn 1 ,vsr,umuG,umuD,sbcB,recJ,etc. The impairment is ' 
achreved by genetic mutation, allelic replacement, selective inhibition by an added reagent such 
* small compound or an expressed MenseRNA, or other techniques Impairment can be 
. of the genes noted, or of homologous genes in any brganism: 

15 HI. VARTATTOMS 

A COATING FRAGMFTsJTS WITH RF.CA PP^m^ \ 
r The frequency of homologous recombmation between library fragrnents and 
cognate endogenous genes can be increased by coatmg the fragments with a recombinogenic 

(1996); Sena & Zarling, Nature Genetics 3, 365 (1996);.Revet et al, J. Mol. Biol. 232 779- 
791 (1993); Kowalczkowski & Zarling in Gene Targeting (CRC 1995), Ch. 7. The 
recombinogenic protein promotes homologous pairing and/or strand exchange The best 
characterized recA protein is from*, co/, and is available from Pharmacia (Piscataway NX) 
In addition to the wild-type protein, a number of mutant recA-Uke proteins have been 
identified (e.g : ,r ecA 803). Further, many organisms have r,cA-like recombinases with strand 
transfer activities (e.g., Ogawa et al., Cold Spring Harbor Symposium on Quantitative 
B,ology 18, 567-576 (1993); Johnson & Symington, Mol. Cell. Biol. 15, 4843-4850 (1995) 
Fugisawaetal.,^/.^^ 13 ? 7473 (1985); Hsieh et al., Ce/7.44, 885 (1986) Hsieh et 
al., J. Biol Chem. 264, 5089 (1989); Fjshel et *.,Proc. NatL Acad. ScL USA 85 3683 
(1988); Cassuto et al., W Gen. Genet 208, 10 (1987); Ganea et al., Mol. Cell Biol 7 3124 
(1987); Moore et al., J. Biol. Chem. 19, 11108 (1990); Keene et al., Nucl. Acids Res 12 
3057 (1984); Kiniec, Cold Spring Harbor Symp. 48, 675 (1984); Kixneic, CW/44 545 ' 
(1986); Kolodner et al., Proc. Natl. Acad Sci. USA ZA, 5560 (1987); Sugino et al' Proc 
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Natl Acad. ScL USA 85, 3683 (1985); Halbrook et al., J. Biol Chem. 264, 21403 (1989); 
Eisen et al., Proc. Natl Acad. ScL USA 85, 748 1 (1988); McCarthy et al, Proc. Nad Acad. 
Scl USA 85, 5854 (1988); Lowenhaupt et al., J. Biol Chem. 264, 20568 (1989). Examples 
of such recombinase proteins include rec A, rec A803, uvsX, (Roca, A. I., Crit Rev. Biochem. 
5 Molec. Biol 25, 415 (1990)), sepl (Kolodner et al., Proc. Natl Acad Set (U.S.A.) 84, 5560 
(1987); Tishkoffet al., Molec. Cell Biol 1 1, 2593), RuvC (Dunderdale et al., Nature 354, 
506 (1991)), DST2 7 KEM\,XRN\ (Dykstra et al., Molec. Cell Biol 11,2583 (1991)), 
STPa/DSn (Clark et al., Molec. Cell Biol. 1 1, 2576 (1991)), HPP-X (Moore et al., Proc. 
Natl Acad ScL (U.S.A.) 88, 9067 (1991)), other eukaryotic recombinases (Bishop et al., Cell 
10 69,439(1992);Shinoharaetal.,Ce//69,457. ^ , 

Rec A protein forms a nucleoprotein filament when it coats a single-stranded 
DNA. In this nucleoprotein filament; one monomer of rec A protein is bound to about 3 
nucleotides. This property of rec A to coat single-stranded DNA is essentially sequence 
independent, although particular sequences favor initial loading of rec A onto a polynucleotide 
15 (e.g., nucleation sequences). The nucleoprotein filament(s) can be formed on essentially any 
DNA to be shuffled and can form complexes with both single-stranded and double-stranded 
DNA in prokaryotic and eukaryotic cells. 

: Before contacting with rec A or other recombinase, fragments are often 

denatured, e.g., by heat-treatment. Rec A protein is then added at a concentration of about 1- 
20 10 fiM. After incubation, the recA-coated single-stranded DNA is introduced into recipient 
cells by conventional methods, such as chemical transformation or electroporation. In general, 
it can be desirable to coat the DNA with a RecA homolog isolated from the organism into 
which the coated DNA is being delivered. Recombination involves several cellular factors and 
the host RecA equivalent generally interacts better with other host factors than less closely 
25 related RecA molecules. The fragments undergo homologous recombination with cognate 
endogenous genes. Because of the increased frequency of recombination due to recombinase 
coating, the fragments need not be introduced as components of vectors. 

Fragments are sometimes coated with other nucleic acid binding proteins that 
promote recombination, protect nucleic acids from degradation, or target nucleic acids to the 
30 nucleus. Examples of such proteins includes Agrobacterium virE2 (Durrenberger et al., Proc. 
Natl Acad ScL USA 86, 9154-9158 (1989)). Alternatively, the recipient strains are deficient 
in RecD activity. Single stranded ends can also be generated by 3 '-5' exonuclease activity or 
restriction enzymes producing 5* overhangs. 
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1. MutS selection ' - . •. 
• The co// mismatch repair protein MutS can be used in affinity 

chromatography to enrich for fragments of double-s^ 

of mismatch. The MutS protein recognizes the bubble formed by the individual strands about 
the point of the mismatch See, e.g., Hsu & Chang, WO 9320233. The strategy of affinity 
enriching for partially mismatched duplexes can be incorporated into the present methods to 
.increase the diversity between an incoming-Kbrary of fragments and corresponding cognate or 
allelic genes in recipient cells. 

Fig..2 shows one scheme in which MutS isused to increase diversity. The 
DNA substrates for enrichment are substantially similar to each other but differ at a' few sites 
For example, the DNA substrates can represent complete or partial genomes (e.g., a 
chromosome library) from different individuals with the differences being due to ' 
polymorphisms. The substrates can also represent induced mutants of a wildtype sequence 
The E)NA substrates are pooled, restriction digested, and denatured to produce fragments of 
single-sfrandedDNA.^ Somesingle- 
stranded fragments reanneal with a perfectly matched complementary strand to generate 
perfectly matched duplexes. Other single-stranded fragments anneal to generate mismatched 
duplexes The mismatched duplexes are enriched from perfectly matched duplexes by MutS 
Chromatography (e.g., with MutS immobilize^ 

endogenous genes as described above. MutS affinity chromatography increases the proportion 
of fragments differing from each other and the cognate endogenous gene. Thus, 
recombination between the mcortmg fragments and endo 
diversity. . 
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■■ . Fig- 3 shows a second strategy for MutS enrichment In this strategy, the 
substrates for MutS enrichment represent variants of a relatively short segment, for example, a 
gene or cluster of genes, in which most of the different variants differ at no more than a single 
. nucleotide. The goal of MutS enrichment is to produce substrates for recombination that 
contain more variations than sequences occurring m nature. This is achieved by fragmenting 
the substrates at random to. produce overlapping fragments. The fragments are denatured and 
reannealed as in the first strategy.; Repealing generates some mismatched duplexes which 
can be separated from perfectly matched duplexes by MutS affinity chromatography. As 
before, MutS chromatography enriches for duplexes bearing at least a single mismatch The 
rrusmatched duplexes are then reassembled into longer fragments. This is accomplished by 
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cycles of denaturation, reannealing, and chain extension of partially annealed duplexes (see 
Section V). After several such cycles, fragments of the same length as the original substrates 
are achieved, except that these fragments differ from each other at multiple sites. These 
fragments are then introduced into cells where they undergo recombination with cognate 
5 endogenous genes. 

2. Positive Selection For Allelic Exchange 
The invention further provides methods of enriching for cells bearing modified 

genes relative to the starting cells. This can be achieved by introducing a DNA fragment 

library (e.g., a single specific segment or a whole or partial genomic library) in a suicide vector 

10 (i.e., lacking a functional replication origin in the recipient cell type) containing both positive 
and negative selection markers. Optionally, multiple fragment libraries from different sources 
(e.g., A subtilis, A licheniformis and A cereus) can be cloned into different vectors bearing 
different selection markers. Suitable positive selection markers include neo\ kanamycin R , 
hyg 9 hisD, gpt, ble, tef. Suitable negative selection markers include /wv-tk, hprt y gpt y SacB 

15 ura3and cytosine deaminase. A variety of examples of conditional replication vectors, 
mutations affecting vector replication, limited host range vectors, and counterselectable 
markers are found in Berg and Berg, supra, and LaRossa, ibid and the references therein. 

In one example, a plasmid with R6K and fl origins of replication, a positively 
selectable marker (beta-lactamase), and a counterselectable marker (B. subtilis sacB) was 

20 used. Ml 3 transduction of plasmids containing cloned genes were efficiently recombined into 
the chromosomal copy of that gene in a rep mutant E. coli strain. < 

Another strategy for applying negative selection is to include a wildtype rpsL 
gene (encoding ribosomal protein S 12) in a vector for use in cells having a mutant rpsL gene 
conferring streptomycin resistance. The mutant form of rpsL is recessive in cells having 

25 wildtype tpsL,. Thus, selection for Sm resistance selects against cells having a wildtype copy 
of rpsL. See Skorupski & Taylor, Gene 169, 47-52 (1996). Alternatively, vectors bearing only 
a positive selection marker can be used with one round of selection for cells expressing the 
marker, and a subsequent round of screening for cells that have lost the marker (e.g., 
screening for drug sensitivity). The screen for cells that have lost the positive selection marker 

30 is equivalent to screening against expression of a negative selection marker. For example, 
Bacillus can be transformed with a vector bearing a CAT gene and a sequence to be 
integrated. See Harwood & Cutting, Molecular Biological Methods for Bacillus, at pp. 31- 
33. Selection for chloramphenicol resistance isolates cells that have taken up vector. After a 
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lost the CAT g en e- About 50% of such ceils will Have undergone recombination with the 
sequence to be integrated. ""' : 

Suicide vectors bearing a positive selection marker and' optionally a negative 
5 selection marker and a DNA fragment can integrate into host chromosomal DNA by a single 
crossover at a site iii chromosomal DNA homologous to the fragment. Recombination 
g^aninte^ ^ 

cells, subsequent recombination between the repeats result s in excision of the vector and either 
-cqu^onofade^ 
10 to wildtype. 

' In the present methods, after transfer of the gene library cloned in a suitable 

vector, positive selection is applied for expression of the positive selection marker Because 
nonmtegrated copies of the suicide vector are rapidly eliminated from cells, this selection 
ennches for cells that have integrated the vector into the host chromosome The cells 
15 survrvmg positive selection can then be propagated and subjected to negative selection, or 
screened for loss of the positive selection marker. Negative selection selects against cells 
expressmg the negative selection marker. Thus, cells that have retained the integrated vector 
express the negative marker arid are selectively ^elirmnated. The cells survi^ both rounds of 

process diversifies by a single exchange of genetic information. However, if the process is 
repeated either with the same vectors or with >a library of fragments generated by PGR of 
pooled DNA from the enriched recombinant population, resulting in the diversity of targeted 
genes being enhanced exponentially each round of recombination This process can be 
25 repeated recursively, with selection being performed as desired. 

3. Individualized Optimization nfr^^ 

In general, the above methods do not require knowledge of the number of 
genes to be optimized, their map location or their function. However, in some instances 
where this information is available for one of moW-ge^ itcah be. e^ted! £>r exampie if 
the property to be acquired by evolution is enhanced recombination ofceUs/one gene likely to 
be important isVecA, even though many other genes, known and unknown, may make 
addrtKmal contributions. In this situation, the recA gene can b e evolved, at least in part 
separately from other candidate genes. The recA gene can be evolved by an^ of the methods 
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of recursive recombination described in Section V. Briefly, this approach entails obtaining 
diverse forms of a recA gene, allowing the forms to recombine, selecting recombinants having 
improved properties, and subjecting the recombinants to further cycles of recombination and 
selection. At any point in the individualized improvement of recA, the diverse forms of recA 
5 can be pooled with fragments encoding other genes in a library to be used in the general 

methods described herein. In this way, the library is seeded to contain a higher proportion of 
variants in a gene known to be important to the property sought to be acquired than would 
otherwise be the case. 

In one example (illustrated in Fig. 20B), a plasmid is constructed carrying a 

10 non-functional (mutated) version of a chromosomal gene such as URA3 y where the wild-type 
gene confers sensitivity to a drug (in this case 5-fluoroorotic acid). The plasmid also carries a 
selectable marker (resistance to another drug such as kanamycin), and a library of recA 
variants. Transformation of the plasmid into the cell results.in expression of the recA variants, 
some of which will catalyze homologous recombination at an increased rate. Those cells in 

15 which homologous recombination occurred are resistant to the selectable drug on the plasmid, 

and to 5-fluoroorotic acid because of the disruption of the chromosomal copy of this gene. f 
The recA variants which give the highest rates of homologous recombination are the most 
highly represented in a pool of homologous recombinants. The mutant recA genes can be ? 
isolated from this pool by PCR, re-shuffled, cloned back into the plasmid and the process 

20 repeated. Other sequences can be inserted in place of recA to evolve other components of the 
homologous recombination system. . 

> If: 

4 . Harvesting DNA Substrates for Shuffling 
In some shuffling methods, DNA substrates are isolated from natural sources 

and are not easily manipulated by DNA modifying or polymerizing enzymes due to recalcitrant 

25 impurities, which poison enzymatic reactions. Such difficulties can be avoided by processing 

DNA substrates through a harvesting strain. The harvesting strain is typically a cell type with 

natural competence and a capacity for homologous recombination between sequences with 

substantial diversity (e.g., sequences exhibiting only 75% sequence identity). The harvesting 

strain bears a vector encoding a negative selection marker flanked by two segments 

30 respectively complementary to two segments flanking a gene or other region of interest in the 

DNA from a target organism. The harvesting strain is contacted with fragments of DNA from 

the target organism. Fragments are taken up by natural competence, or other methods 

described herein, and a fragment of interest from the target organism recombines with the 
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vector of the harvesting strain causing loss of the negative selection marker. Selection against 
the negative marker allows isolation of cells that have taken up the fogment of interest 
Shufm^g can be carried out in the haiveste^ 

isolated from the harvester strain for in vitro shuffling or transfer to a different cell type for in 
vivo shuffling Alternatively; the vector can be transferred to a different cell type by 
conjugation, protoplast fusion or electrofusioni An example of a suitable harvester strain is 
Acinetobacter calcoaceticus mutS. Melnikov and Youngman, (1999) NuclAddRes 
27(4):1056-1062 This strain is naturally competent and takes up DNAWa nonsequence- 
specific mariner. Also, because of the mutS mutation, this strain is capable of homologous 
recombination of sequences showing only 75% sequence identity. 

IV. APPT.TP ATTOMg 

A RECOTyfRTNOGENTCTTV ; , 

One goal of whole cell evolution is to generate cells having improved capacity 
for recombination. Such cells are useful for a variety of purposes in molecular genetics 
including the in vivo formats of recursive sequence recombination described in Section V 
Almost thirty genes (e.g.,recKrecB,recC,recb,recE,recF,recG,recO,recQ recR,recT 
^ruvB, r«vC^sbc*, ss b,topK^ uv rD, E, recL, niufD, mutH, ntufL 

mufT ,^V- h *K>) and DNA sites (eg.,*/,/, ^cN, involved m genetic recombination ' 

20 other organisms (e.g., rad51, rad55-rad57, Dmcl in yeast (see Kowalczykowski et al 

Microbiol Rev. 5S, 401-465 (1994); Kowalc^owski & Zarling, supra) and human homologs 
of Rad51 and Dmcl have been identified (see Sandier et al., Nucl Acids Res. 24, 2125-2132 
(1996)). At least some of the E. co// genes, including recA are functional in mammalian cells 
and can be targeted to the nucleus as a fusion with SV40 large T antigen nuclear targeting 
25 sequence (Reiss et al., Proc. Natl. Acad Sci. 93, 3094-3098 (1996)). Further, 

mutations in mismatch repair genes, such as mutU mutS. mut^ mufl relax homology 
requirements and allow recombination between more diverged sequences (Rayssiguier et al 
^342,396-401 (1989)), The extent of recombination between divergent, strains can be 
enhanced by impairing mismatch repair, genes and stimulating SOS genes. Such can be 
30 achieved by use of appropriate mutant strains and/or growth under conditions of metabolic 
stress, which have been found to stimulate SOS and inhibit mismatch repair genes Vulic et 
*, Proc. Natl Acad SclMSAH (1997).. ^addition, this can be achieved by inij^airing the 
products of mismatch repair genes by exposure to selective inhibitors. 
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Starting substrates for recombination are selected according to the general 
principles described above. That is, the substrates can be whole genomes or fractions thereof 
containing recombination genes or sites. Large libraries of essentially random fragments can 
be seeded with collections of fragments constituting variants of one or more known 
5 recombination genes, such as recA. Alternatively, libraries can be formed by mixing variant 
forms of the various known recombination genes and sites. 

The library of fragments is introduced into the recipient cells to be improved 
and recombination occurs, generating modified cells. The recipient cells preferably contain a 
marker gene whose expression has been disabled in a manner that can be corrected by 

10 recombination. For example, the cells can contain two copies of a marker gene bearing 
mutations at different sites, which copies can recombine to generate the wildtype gene. A 
suitable marker gene is green fluorescent protein. A vector can be constructed encoding one 
copy of GFP having stopcodons near the N-terminus, and another copy of GFP having 
stopcodons near the C-terminus of the protein. The distance between the stop codons at the 

15 respective ends of the molecule is 500 bp and about 25% of recombination events result in 
active GFP. Expression of GFP in a cell signals that a cell is capable of homologous 
recombination to recombine in between the stop codons to generate a contiguous coding 
sequence. By screening for cells expressing GFP, one enriches for cells having the highest 
capacity for recombination. The same type of screen can be used following subsequent rounds 

20 of recombination. However, unless the selection marker used in previous round(s) was 

present on a suicide vector, subsequent round(s) should employ a second disabled screening 
marker within a second vector bearing a different origin of replication or a different positive 
selection marker to vectors used in the previous rounds. 

B MULTIGENOMIC COPY NUMBER-GENE REDUNDANCY 
25 The majority of bacterial cells in stationary phase cultures grown in rich media 

contain two, four or eight genomes. In minimal medium the cells contain one or two genomes. 

The number of genomes per bacterial cell thus depends on the growth rate of the cell as it 

enters stationary phase. This is because rapidly growing cells contain multiple replication 

forks, resulting in several genomes in the cells after termination. The number of genomes is 

30 strain dependent^ although all strains tested have more than one chromosome in stationary 

phase. The number of genomes in stationary phase cells decreases with time. This appears to 

be due to fragmentation and degradation of entire chromosomes, similar to apoptosis in 

mammalian cells. This fragmentation of genomes in cells containing multiple genome copies 
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results in massive recombination arid mutagenesis. Useful mutants may find ways to use 
energy sources that wm aUow them b contuse growing Multigenome or gene-redundant 
cells are much more resistant to mutagenesis and can be miproved for a selected trait faster 
Some cell types, such as ZJemW^ rocft^ 
5 i77i 5495-5505 (1995)) exhibit polyploidy throughout the ceil cycle. This cell type is highly 
rad,at,on resistant due to the presence of many copies of the genome. High frequency 
recombmation between th* genomes allows rapid removal of mutations induced by a variety of 
DNA damaging agents. ■ 

Agoal of the present methods is to evolve other cell types to have increased 
genome copy number akin to that of Deinoccocus radians. Preferably, the increased copy 
number is maintained through all or most of its cell cycle in all or most growth conditions 
The presence of multiple genome copies in such cells results 

homologous recombination in these cells, both between copies of a gene in different genomes 
withm the cell, and between a genome within the cell and a transfected fragment The 
15 increased frequency of recombination allows the cells to be evolved more quickly, to acquire 
other useful characteristics. 

Starting substrates for recombination can be a diverse library of genes only a 
few of which are relevant to genomic copy number, a focused library formed from variants of 
- g e "^) or. suspected to have a role in genomic copy .number or a combination of th. 

evolution^ genes involved in replication and cell septation such.that ceU septation is inhibited 
without impairing replication. Genes involved in replication include tuslxerC xerD dif, 
SyrKgyrB, pa rE,parC, dif, Ter K TerB, TerC, TerD, TerE, TerF, and genes influencmg 
chromosome partitioning and gene copy number include ™nD, m ukA (toIC) mukB m «kC 
25 muXD, spoOl, spoWE (Wake & Errington, Annti. Rev. Genet. 29, 41-67 (1995)) A useful 
source of substrates is the genome of a cell type such ^/^c^ «^ ^ to ^ 
the desn-ed phenotype of multigenomic copy number. As well as, or instead of, the above 
substrates, fragments encoding protein or aritisense RNA inhibitors to genes known to be 
involved in cell septation can also be used. 

0 In nature, the existence of multiple genomic copies in a cell type would usually 

not be advantageous due to the greater nutritional requirements needed to maintain this copy 
number: However, artificial conditions can b e devised to select for hJgh ^ 
Modified cells having recombinant genomes are grown in rich media (m which conditions 
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multicopy number should not be a disadvantage) and exposed to a mutagen, such as ultraviolet 
or gamma irradiation or a chemical mutagen, e.g., mitomycin, nitrous acid, photoactivated 
psoralens, alone or in combination, which induces DNA breaks amenable to repair by 
recombination. These conditions select for cells having multicopy number due to the greater 
5 efficiency with which mutations can be excised. Modified cells surviving exposure to mutagen 
are enriched for cells with multiple genome copies. If desired, selected cells can be 
individually analyzed for genome copy number (e.g., by quantitative hybridization with 
appropriate controls). Some or all of the collection of cells surviving selection provide the 
substrates for the next round of recombination. In addition, individual cells can be sorted 
10 using a cell sorter for those cells containing more DNA, e.g., using DNA specific fluorescent 
compounds or sorting for increased size using light dispersion. Eventually cells are evolved 
that have at least 2, 4, 6, 8 or 10 copies of the genome throughout the cell cycle. In a similar 
manner, protoplasts can also be recombined. 

C SECRETION 

1 5 The protein (or metabolite) secretion pathways of bacterial and eukaryotic cells 

can be evolved to export desired molecules more efficiently, such as for the manufacturing of 
protein pharmaceuticals, small molecule drugs or specialty chemicals. Improvements in 
efficiency are particularly desirable for proteins requiring multisubunit assembly (such as 
antibodies) or extensive posttranslational modification before secretion., 

20 The efficiency of secretion may depend on a number of genetic sequences 

including a signal peptide coding sequence, sequences encoding protein(s) that cleave or 
otherwise recognize the coding sequence, and the coding sequence of the protein being 
secreted. The latter may affect folding of the protein and the ease with which it can integrate 
into and traverse membranes. The bacterial secretion pathway in K coli include the Sec A, 

25 SecB, SecE, SecD and SecF genes. In Bacillus subtilis, the major genes are secA, secD, secE, 
secF, secY, ffh, ftsY together with five signal peptidase genes (sipS, sipT, sipU, sipV and 
sipW) (Kunst et al, supra). For proteins requiring posttranslational modification, evolution of 
genes effecting such modification may contribute to improved secretion. Likewise genes with 
expression products having a role in assembly of multisubunit proteins (e.g., chaperonins) may 

30 also contribute to improved secretion. 

Selection of substrates for recombination follows the general principles 
discussed above. In this case, the focused libraries referred to above comprise variants of the 
known secretion genes. For evolution of prokaryotic cells to express eukaryotic proteins, the 
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initial substrates for recombination are often obtained at least in part from eukaryotic sources. 
Incoming fragments can undergo recombination both with chromosomal DNA in recipient 
cells and with the screening marker construct present in such cells (see below). The latter 
form 6f recombination is important for evolution of the signal coding sequence incorporated in 
5 the screening marker construct. Improved secretion can be screened by the inclusion of 
marker construct in the cells being evolved The marker construct encodes a marker gene, 
operably linked to expression sequences, and usually operably linked to a signal peptide coding 
sequence, the marker gene is sometimes pressed as a fusion protein with a recombinant 
protein of interest This approach is useful when one wants to evolve me r^mbmant protein 
10 coding sequence together with secretion genes. 

In one variation, the marker gene encodes a product that is toxic to the ceU 
containing the construct unless the product is secreted. Suitable toxin proteins include 
diphtheria toxin and ricin toxin. Propagation of modified cells bearing such a construct selects 
for cells that have evolved to improve secretion of the toxin. Alternatively, the marker gene 
can encode a ligand to a known receptor, and cells bearing the Ugand can be detected by 
FAGS using labeled receptor. Optionally, such a ligand can be operably linked to a 
phospholipid anchoring sequence that binds the ligand to the ceU membrane surface following 
secretion. (See commonly owned, copending 08/309,345) In a further variation, secreted 

^^• P l 0t ^- ^'^.^^^"^ r0 ^^ ^ the - < = e ? 1 se ^ re l in g, it by distributing 
Wn^u^ls^itoiag^ 

Secreted protein is confined within the agar matrix and can be detected by e.g:,.FACS. In 
another variation, a protein of interest is expressed as a fusion protein together with b- 
lactamase or alkaline phosphatase: These enzymes metabolize commercially available 
chromogenic substrates (e.g., X-gal), but do so only after secretion into the periplasm. 
Appearance of colored substrate in a colony of cells therefore indicates capacity to secrete the 
fusion protein and the intensity of color is related to the efficiency of secretion" 

The cells identified by these screening and selection methods have the capacity 
to secrete increased amounts of protein. This capacity may be attributable to increased 
secretion and increased expression, or from kcreased secretion atoned 

30 1 . Expression 

\ Cells can also be evolved to acquire mcreVsed expression of a recombinant 

protein. The level of expression i s ; of course, highly dependent on the construct from which 
the recombinant protein is expressed and the regulatory sequences; such as the promoter, 
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enhancer(s) and transcription termination site contained therein. Expression can also be 
affected by a large number of host genes having roles in transcription, posttranslational 
modification and translation. In addition, host genes involved in synthesis of ribonucleotide 
and amino acid monomersfor transcription and translation may have indirect effects on 
5 efficiency of expression. Selection of substrates for recombination follows the general 

principles discussed above. In this case, focused libraries comprise variants of genes known to 
have roles in expression. For evolution of prokaryotic cells to express eukaryotic proteins, the 
initial substrates for recombination are often obtained, at least in part, from eukaryotic 
sources; that is eukaryotic genes encoding proteins such as chaperonins involved in secretion 

10 and/assembly of proteins. Incoming fragments can undergo recombination both with 

chromosomal DNA in recipient cells and with the screening marker construct present in such 
cells (see below). . 

Screening for improved expression can be effected by including a reporter 
construct in the cells being evolved. The reporter construct expresses (and usually secretes) a 

15 reporter protein, such as GFP, which is easily detected and nontoxic. The reporter protein can 
be expressed alone or together with a protein of interest as a fusion protein. If the reporter 
gene is secreted, the screening effectively selects for cells having either improved secretion or 
improved expression, or both. 

2. Plant Cells ' . 

20 A further application of recursive sequence recombination is the evolution of 

plant cells, and transgenic plants derived from the same, to acquire resistance to pathogenic 

diseases (fungi, viruses and bacteria), insects, chemicals (such as salt, selenium, pollutants, 

pesticides, herbicides, or the like), including, e.g., atrazine or glyphosate, or to modify 

chemical composition, yield or the like. The substrates for recombination can again be whole 

25 genomic libraries, fractions thereof or focused libraries containing variants of gene(s) known 
or suspected to confer resistance to one of the above agents. Frequently, library fragments are 
obtained from a different species to the plant being evolved. 

The DNA fragments are introduced into plant tissues, cultured plant cells, plant 
microspores, or plant protoplasts by standard methods including electroporation (From et al., 

30 Proc. Natl Acad. Sci. USA 82, 5824 (1985), infection by viral vectors such as cauliflower 
mosaic virus (CaMV) (Hohn et al. , Molecular Biology of Plant Tumors, (Academic Press, 
New York, 1982) pp. 549-560; Howell, US 4,407,956), high velocity ballistic penetration by 
small particles with the nucleic acid either within the matrix of small beads or particles, or on 
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the surface (Klein etal., Nature 327,>70-73 (1987)), use of pollen as vector (WO 85/01856), 
or use of Agrobacterium iumefaciens or A. rhizogenes carrying a T-DNA plasmid in which 
DNA fragments are cloned. The T-DNA plasmid is transmitted to plant ceUs upon infection 
by Agrobacterium tumefaciens, and a portion is stabiy integrated into the plant genome 
(Horsch et al., Science 233, 496-498 (1984); Fraley et al, Proc. Natl. Acad. Sci. USA 80 
4803(1983)). ' ' 

Diversity can also be generated by genetic exchange between plant protoplasts 
according to the same principles described below for fungal protoplasts. Procedures for 
formation and fusion of plant protoplasts are described by Takahaslii et al., US 4,677,066; 
Akagi et al., US 5,360,725; Shimamoto et al., Us 5,250,433; Cheney et al., US SA26,040. 

'• - After a suitable period of incubation to allow recombination to occur, and for 
expression of recombinant genes, the plant cells are contacted with the agent to which 
resistance is to be acquired, and surviving plant cells are collected. Some or all of these plant 
cells can be subject to a further round of recombination and screening. Eventually, plant cells 
15 having the required degree 6f resistance are obtained. 

These cells can then be cultured into transgenic plants. Plant regeneration from 
cultured protoplasts is described in Evans et al., "Protoplast Isolation and Culture," Handbook 
of Plant Cell Cultures 1, 124-176 (MacMillan Publishing Co., New York, 1983); Davey, 
"Recent Developments in the Culture and Regeneration of Plant Protoplasts," Protoplasts, 

iPlSR3^p^ 

of Cereals and Other Recalcitrant Crops," Protoplasts (1983) pp. 31-41, (Birkhauser, Basel 
1983); Binding, "Regeneration of Plants," Plant Protoplasts, pp. 21-73, (CRC Press, Boca 
Raton, 1985) 

In a variation of the above method, one or more prehminary rounds of 
recombination and screening can be performed in bacterial cells according to the same general 
strategy as described for plant cells. More rapid evolution can be achieved in bacterial cells 
due to their greater growth rate and the greater efficiency with which DNA can be introduced 
into such cells. After one or more rounds of recombination/screening, a DNA fragment library 
is recovered from bacteria and transformed into the plant cells. The library can either be a 
complete library or a focused library. A focused library can be produced by amplification from 
primers specific for plant sequences, particularly plant sequences known or suspected to have 
a role in conferring resistance/ 
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3. Example: Concatemeric Assembly of Atrazine-Cataboliinnp Plasmid 
Pseudomonas atrazine catabolizing genes AtzA and AtzB were subcloned from 

pMDl (deSouza et al. 9 AppL Environ. Microbiol. 61, 3373-3378 (1995); de Souza et al., J. 

BacterioL 178, 4894-4900 (1996)) into pUC18. A 1.9 kb Aval fragment containing AtzA 

5 was end-filled and inserted into an Aval site of pUC18. A 3.9 kb Clal fragment containing 

AtzB was end-filled and cloned into the HincII site of pUC 1 8. AtzA was then excised from 

pUC18 with EcoRI and BamHI, AzB with BamHI and Hindin, and the two inserts were co- 

ligated into pUCl 8 digested with EcoRI and HindJIL The result was a 5.8 kb insert 

containing AtzA and AtzB in pUC18 (total plasmid size 8.4 kb). 

10 Recursive sequence recombination was performed as follows. The entire 8.4 

kb plasmid was treated with DNasel in 50 mM Tris-Cl pH 7.5, 10 mM MnCh and fragments 
between 500 and 2000 bp were gel purified: The fragments were assembled in a PCR reaction 
using Tth-XL enzyme and buffer from Perkin Elmer, 2.5 mM MgOAc, 400 jjM dNTPs and 
serial dilutions of DNA fragments. The assembly reaction was performed in an MJ Research 

15 "DNA Engine" programmed with the following cycles: 1) 94°C, 20 seconds; 2) 94°C, 15 

seconds; 3) 40°C, 30 seconds; 4) 72°C, 30 seconds + 2 seconds per cycle; 5) go to step 2, 39 
more times; 6) 4°C. 

The AtzA and AtzB genes were not amplified from the assembly reaction using 
the polymerase chain reaction, so instead DNA was purified from the reaction by phenol 

20 extraction and ethanol precipitation, then digested the assembled DNA with a restriction 

enzyme that linearized the plasmid (Kpnl: the Kpnl site in pUC18 was lost during subcloning, 
leaving only the Kpnl site in AtzA). Linearized plasmid was gel-purified, self-ligated 
overnight and transformed into K coli strain NM522. (The choice of host strain was relevant: 
very little plasmid of poor quality was obtained jErom a number of other commercially available 

25 strains including TGI, DH10B, DH12S.) 

Serial dilutions of the transformation reaction were plated onto LB plates 
containing 50 pg/ml ampicillin, the remainder of the transformation was made.25% in glycerol 

and frozen at -80°C. Once the transformed cells were titered, the frozen cells were plated at a 
density of between 200 and 500 on 150 mm diameter plates containing 500 jig/ml atrazine and 
30 grown at 37°C. 

Atrazine at SOO^g/ml forms an insoluble precipitate. The products of the AtzA 
and AtzB genes transform atrazine into a soluble product, Cells containing the wild type AtzA 
and AtzB genes in pUC18 will thus be surrounded by a clear halo where the atrazine has been 
degraded. The more active the AtzA and AtzB enzymes, the more rapidly a clear halo will 
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form and grow on atrazine-comainirig plates! Positives were picked as those colonies that 
most rapidly formed the largest clear zones. The (approximately ) 40 best colonies were 
picked, pooled, grown in the presence of 50 ^g/rnl am^ 

The entire process (from DNase-treatmeiit to plating on atrazine plates) was repeated 4 times 
5 with 2000-4000 colonies/cycle. 

A modification was made in the fourth round. Cells were plated on both 500 
pg/ml atrazine, and 500 ug/ml of the atrazine analogue lerbutylazine, which was undegradable 
by the wild type AtzA and AtzB genes! Positives were obtained that degraded both 
compounds The atrazine cWorohydrolase (product of AtzA gene) was 10-100 fold higher 
10 than that produced by the wildtype gene. ^ 

P. PLANT GENQMF. S HUFFLING 

Plant genome shuffling allows recursive cycles to be used for the introduction 
and recombination of genes or pathways that confer improved properties to desired plant 
species. Any plant species, including weeds and wild cultivars, showing a desired trait, such as 
herbicide resistance, salt tolerance, pest resistance, or temperature tolerance, can be used as 
the source of DNA that is introduced into the crop or horticultural host plant species. 

Genomic DNA prepared from the source plant is fragmented (e.g. by DNasel, 
restriction enzymes, or mechanically) and cloned into a vector suitable for making plant 
genomic libraries, such as pGA482 (An. G., 1 995, Methods Mol Biol. 44:47-58). This vector 
>^^aiij£tli^ 

antibiotic markers for selection in E coli, Agrobaclerium, and plant cells. A multicloning site 
is provided for insertion of the genomic fragments. A cos sequence is present for the efficient 
packaging of DNA into bacteriophage lambda heads for transfectioh of the primary library into 
E. coli. The vector accepts DNA fragments of 25-40 kb. 

The primary library can also be directly electroporated into an A. tume/aciens 
or A. rhizogenes strain that is used to infect and transform host plant cells (Main, GD et al., 
1995, Methods Mol. Biol. 44:405-412). Alternatively, DNA can be introduced by 
electroporation or PEG-mediated uptake into protoplasts of the recipient plant species (Bilang 
et al. (1994) Plant Mol. Biol Manual , Kluwer Academic Publishers, Al: 1-16) or by particle 
bombardment of cells or tissues (Christou, ibid, A2: 1-15). If necessary, antibiotic markers in 
the T-DNA region can-be eliminated, as long as selection for the trait is possible, so that the 
final plant products contain no antibiotic genes. 
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Stably transformed whole cells acquiring the trait are selected on solid or liquid 
media containing the agent to which the introduced DNA confers resistance or tolerance. If 
the trait in question cannot be selected for directly, transformed cells can be selected with 
antibiotics and allowed to form callus or regenerated to whole plants and then screened for the 
5 desired property. 

The second and further cycles consist of isolating genomic DNA from each 
transgenic line and introducing it into one or more of the other transgenic lines. In each 
round, transformed cells are selected or screened for incremental improvement. To speed the 
process of using multiple cycles of transformation, plant regeneration can be deferred until the 
10 last round. Callus tissue generated from the protoplasts or transformed tissues can serve as a 
source of genomic DNA and new host cells. After the final round, fertile plants are 
regenerated and the progeny are selected for homozygosity of the inserted DNAs. Ultimately, 
a new plant is created that carries multiple inserts which additively or synergistically combine 
to confer high levels of the desired trait. Alternatively, microspores can be isolated as 
15 homozygotes generated from spontaneous diploids. 

In addition, the introduced DNA that confers the desired trait can be traced 
because it is flanked by known sequences in the vector. Either PCR or plasmid rescue is used 
to isolate the sequences and characterize them in more detail. Long PCR (Foord, OS and 
Rose, EA, 1995, PCR Primer A Laboratory Manual . CSHL Press, pp 63-77) of the full 25-40 
20 kb insert is achieved with the proper reagents and techniques using as primers the T-DNA 

border sequences. If the vector is modified to contain the K coli origin of replication and an . 
antibiotic marker between the T-DNA borders, a rare cutting restriction enzyme, such as NotI 
or Sfil, that cuts only at the ends of the inserted DNA is used to create fragments containing 
the source plant DNA that are then self-ligated and transformed into E. coli where they 
25 replicate as plasmids.. The total DNA or subfragment of it that is responsible for the 

transferred trait can be subjected to in vitro evolution by DNA shuffling. The shuffled library 
can be reiteratively recombined by any method herein and then introduced into host plant cells 
and screened for improvement of the trait. In this way, single and multigene traits can be 
transferred from one species to another and optimized for higher expression or activity leading 
30 to whole organism improvement. This entire process can also be reiteratively repeated. 

Alternatively, the cells can be transformed microspores with the regenerated 
haploid plants being screened directly for improved traits as noted below i 
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E. MICROSPORE MANIPULATION 

Microspores are haploid (In) male spores that develop into poUen grains. 
Anthers contain a large numbers of microspores in early-uniriucleate to first-mitosis stages. 
Microspores have been Successfully induced to develop into plants for niost species, such as, 
5 e.g., rice (Chen, GC 1977 In Vitro. 13: 484^89), tobacco (Atanassov, I. et al. 1998PlantMol 
Biol. 38: 1169-1 178), Tradescantia (Savage JRK arid Papworth DG. 1998 Mutat Res. 
422:313-322), Arabidopsis (Park SK et al. 1998 Development. 125:3789-3799), sugar beet 
(Majewska-Sawka A and Rodrigues-Garcia MJ 1996 J Cell Sci. 109:859-866), Barley (Olseri 
FL 1991 Hereditas 115:255-266) and oilseed rape (Boutillier KA etal. 1 994 Plant Mol Biol 
10 26:1711-1723). 

The plants derived from microspores are predominantly haploid or diploid 
Cmfrequently polyploid and aneuploid). The diploid plants are homozygous and fertile and can 
be generated in a relatively short time. Microspores obtained from PI hybrid plants represent 
great diversity, thus being an excellent model for studying recombinatioa In addition, 
microspores can be transformed with T-DNA introduced by agrobacterium or other available 
means and then regenerated into individual plants. Furthermore, protoplasts can be made from 
microspores and they can be fused similar to what occur in fungi and bacteria. 

Microspores, due to their complex ploidy and regenerating ability, provide a 
tool for plant whole genome shuffling. For example, if pollens from 4 parents are collected 
;?JlS^E^!^andi^^ 

16 P os sible combinations. Assuming this plant has 7 chromosomes, microspores collected 
from the 16 progenies will represent 2 7 xl6 = 2048 possible chromosomal combinations. This 
number is even greater if meiotic processes occur. When diploid, homozygous embryos are 
generated from these microspores, in many cases, they are screened for desired phenotypes, 
such as herbicide- or disease- resistant In addition; for plant oil composition these embryos 
can be dissected into two halves: one for analysis the other for regeneration into a viable plant. 

Protoplasts generated from microspores (especially the haploid ones) are 
pooled and fused. Microspores obtained from plants generated by protoplast fusion are 
pooled and fused again, increasing the genetic diversity of the resulting microspores. 

Microspores can be subjected to mutagenesis in various ways, such as by 
chemical, mutagenesis, radiation-induced mutagenesis and, e.g., t-DNA transformation, prior 
to fusion or regeneration. New mutations which are generated can be recombined through the 
recursive processes described above and herein. 
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F. EXAMPLE: ACQUISITION OF SALT TOLERANCE 

As depicted in Fig. 2 1 , DNA from a salt tolerant plant is isolated and used to 

create a genomic library. Protoplasts made from the recipient species are 
transformed/transfected with the genomic library (e.g., by electroporation, agrobacterium, 
5 etc.). Cells are selected on media with a normally inhibitory level of NaCL Only the cells with 
newly acquired salt tolerance will grow into callus tissue. The best lines are chosen and 
genomic libraries are made from their pooled DNA. These libraries are transformed into 
protoplasts made from the first round transformed calli. Again, cells are selected on increased 
salt concentrations. After the desired level of salt tolerance is achieved, the callus tissue can 
10 be induced to regenerate whole plants. Progeny of these plants are typically analyzed for 
homozygosity of the inserts to ensure stability of the acquired trait. At the indicated steps, 
plant regeneration or isolation and shuffling of the introduced genes can be added to the 
overall protocol. 

G. TRANSGENIC ANIMALS 

15 1. Transgene Optimization 

One goal of transgenesis is to produce transgenic animals, such as mice, 

rabbits, sheep, pigs, goats, and cattle, secreting a recombinant protein in the milk. A transgene 

for this purpose typically comprises in operable linkage a promoter and an enhancer from a 

milk-protein gene (e.g., a, P, or y casein, p-lactoglobulin, acid whey protein or a-lactalbumin), 

20 a signal sequence, a recombinant protein coding sequence and a transcription termination site. 
Optionally, a transgene can encode multiple chains of a multichain protein, such, as an 
immunoglobulin, in which case, the two chains are usually individually operably linked to sets 
of regulatory sequences. Transgenes can be optimized for expression and secretion by 
recursive sequence recombination. Suitable substrates for recombination include regulatory 

25 sequences such as promoters and enhancers from milk-protein genes from different species or 
individual animals. Cycles of recombination can be performed in vitro or in vivo by any of the 
formats discussed in Section V. Screening is performed in vivo on cultures of mammary-gland 
derived cells, such as HC1 1 or MacT, transfected.with transgenes and reporter constructs such 
as those discussed above. After several cycles of recombination and screening, transgenes 

30 resulting in the highest levels of expression and secretion are extracted from the mammary 

gland tissue culture cells and used to transfect embryonic cells, such as zygotes and embryonic 
stem cells, T which are matured into transgenic, animals. 
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2. Who le Animal Op timiyritir»n 
In this approach, libraries of incoming fragments are transformed into 
embryonic cells, such as ES cells or zygotes. The fragments can be variants of a gene known 
to confer a desired property, such as growth hormone. Alternatively, the fragments can be 
partial or complete genomic libraries including many genes. 

Fragments are usually introduced into zygotes by microinjection as described 
by Gordon et al., Methods Enzymol. 101, 414 (1984); Hogan et al^ Manipulation of the 
Mouse Embryo: A Laboratory Manual (C.S.H.L. N.Y., 1986) (mouse embryo); and Hammer 
et al., Nature 315, 680 (1985) (rabbit and porcine embryos); Gandolfi et al , J: Repr od. Pert. 
81, 23-28 (1987); Rexroad et al ., J. Anim. Sci. 66, 947-953 (1988) (ovine embryos) and 
Eyestone et al., J. Reprod. Fert. 85, 715-720 (1989); Camous et al., J. Reprod Pert. 72, 779- 
785 (1984); and Heyman et al., Theriogenology 27, 5968 (1987) (bovine embryos). Zygotes 
are then matured and introduced into recipient female animals which gestate the embryo and 
give birth to a transgenic offspring. 



15 



tion 



Alternatively, transgenes can be introduced into embryonic stem cells (ES). 
These cells are obtained from preimplantation embryos cultured in vitro. Bradley et al 
Nature 309, 255-258 (1984) Transgenes can be introduced into such cells by electroporatk 
or microinjection: Transformed ES cells are combined with blastocysts from a non-human 
animal. The ES cells colonize the embryo and in some embryos form the germ line of the 

Regardless whether zygotes or ES are used, screening is performed on whole 
^ animals for a desired property, such as increased size and/or growth rate. DNA is extracted 
from animals having evolved toward acquisition of the desired property. This DNA is then 
used to transfect further embryonic cells. These cells can also be obtained from animals that 
have acquired toward the desired property in a split and pool approach: : That is, DNA from 
one subset of such animals is transformed into embryonic ^ cells prepared from another subset 
of the animals. Alternatively, the DNA from animals that have evolved toward acquisition of 
the desired property can be transfected into fresh embryonic cells. In either alternative, 
transfected cells are matured into transgenic animals, and the animals subjected to a further 
30 round of screening for the desired property. 

Fig. 4 sho ws the application of this approach for evolving fish toward a larger 
size. Initially, a library is prepared of variants of a growth hormone gene The variants can be 
natural or induced. The library is coated with recA protein and transfected into fertilized fish 
eggs. The fish eggs then mature into fish of different sizes. The growth hormone gene 
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fragment of genomic DNA from large fish is then amplified by PCR and used in the next round 
of recombination. Alternatively, fish a-IFN is evolved to enhance resistance to viral infections 

as described below 

3 . Evolution of improved hormones for expression in transgenic 
5 animals (e.g.. Fish) to create animals with improved traits. 

Hormones and cytokines are key regulators of size, body weight, viral 

resistance and many other commercially important traits. DNA shuffling is used to rapidly 

evolve the genes for these proteins using in vitro assays. This was demonstrated with the 

evolution of the human alpha interferon genes to have potent antiviral activity on murine cells. 

10 Large improvements in activity were achieved in two cycles of family shuffling of the human 

IFN genes. 

In general, a method of increasing resistance to vims infection in cells can be 
performed by first introducing a shuffled library comprising at least one shuffled interferon 
gene into animal cells to create an initial library of animal cells or animals. The initial library is 

1 5 then challenged with the virus. Animal cells or animals are selected from the initial library 

which are resistant to the virus and a plurality of transgenes from a plurality of animal cells or 
animals which are resistant to the virus are recovered. The plurality of transgenes is recovered 
to produce an evolved library of animal cells or animals which is again challenged with the 
virus: Cells or animals are selected from the evolved library the which are resistant to the 

20 virus. 

For example, genes evolved with in vitro assays are introduced into the 
germpiasm of animals or plants to create improved strains. One limitation of this procedure is 
that in vitro assays are often only crude predictors of in vivo activity. However, with 
improving methods for the production of transgenic plants and animals, one can now marry 

25 whole organism breeding with molecular breeding. The approach is to introduce shuffled 

libraries of hormone genes into the species of interest. This can be done with a single gene per 
transgenic or with pools of genes per transgenic. Progeny are then screened for the phenotype 
of interest. In this case, shuffled libraries of interferon genes (alpha IFTsf for example) are 
introduced into transgenic fish. The library of transgenic fish are challenged with a virus. The 

30 most resistant fish are identified (i.e. either survivors of a lethal challenge; or those that are 
deemed most □healthy' after the challenge). The IFN transgenes are recovered by PCR and 
shuffled in either a pool wise or a pairwise fashion. This generates an evolved library of IFN 
genes. A second library of transgenic fish is created and the process is repeated. In this way, 
IFN is evolved for improved antiviral activity in a whole organism assay. 
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This procedure is general and can be ap pUed to ^ ^ ^ js by & 

gene or gene family of interest and Which can 

Fish interferon sequence data is available for the Japanese flatfish 
(Paralichthys olivaceus) zs mRNA sequence (Tamai et al. (1993) Cloning and expression of 
flatfish (Paralichthys olivaceus) interferon cDNA " Biochem. Biop hv. Y j 74 182 . 186 - 
see also, Tami et al. (1993) "Purification and characterization of interferon-like antiviral 
protein derived from flatfish (^a//c^ olivaceus) lymphocytes immortalized by 
o^^Qn^^^^ 121-131). This sequence can be used to clone out 
IFN genes from this species. This sequence can also be used as a probe to clone homologous 
interferons from additional species offish. As well, additional sequence information can be 
utilized to clone out more species offish interferons. Once a library of interferons has been, 
cloned, these can be family shuffled to generate a library of variants. 

A Protein sequence of flatfish interferon is: 
MIRSTNSNKS DILMNCHHLIIR YDDNSAPSGGSL FRKMTMLLKL LKLITFGQLRW 
15 ELFVKSNTSKTS TVLSIDGSNLISL LDAPKDILDKPSCNSF QLDLLL AS S A WTLLT 
ARLLNYPYPA VLLSAGVAS WLVQVP: 

In one embodiment, BHK-21 (A fibroblast cell line from hamster) can be 
transfected with the shuffled IFN-expression plasmids. Active recombinant IFN is produced 
and then purified by WGA agarose affinity chromatography (Tamai, et al.. 1993 Biochim 

by rhabdoviurs. Tami et al. (1993) "Purification and characterization of interferon-like 
antiviral protein derived from flatfish {Paralichthys olivaceus) lymphocytes immortalized by 
oncogenes." Cvtotechnolo g y 1993; 1 1 (2): 121-131). 

H. WHOLE GE NOME SHTTFF LTNG TNJ WTG HER ORGAUTSMg.. 
25 POOL WISE RECURSIVE BREEDING " ~ 2 ~ 

The present invention provides a procedure for generating large combinatorial 
libraries of higher eukaryotes, plants, fish, domesticated animals, etc. In addition to the 
procedures outlined above, poolwise combination of male and female gametes, can also be 
used to generate large diverse molecular libraries. : 
30 - In one ^P ect ' P TO <* ss includes recursive poolwise matings for several 

generations without any deliberate screening This is similar to classical breeding, except that 
pools of organisms, rather than pairs of organisms, are mated, thereby accelerating the 
generation of genetic diversity. 
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This method is similar to recursive fusion of a diverse population of bacterial 
protoplasts resulting in the generation of multiparent progeny harboring genetic information 
from all of the starting population of bacteria. The process described here is to perform 
analogous artificial or natural matings of large populations of natural isolates, imparting a split 
5 pool mating strategy. Before mating, all of the male gametes i.e. pollen, sperm, etc., are 
isolated from the starting population and pooled. These are then used to "self fertilize a. 
mixed pool of the female gametes from the same population. 

The process is repeated with the subsequent progeny for several generations, 
with the final progeny being a combinatorial organism library with each member having 
10 genetic information originating from many if not all of the starting "parents." This process 
generates large diverse organism libraries on which many selections and or screens can be 
imparted, and it does not require sophisticated in vitro manipulation of genes. However, it 
results in the creation of useful new strains (perhaps well diluted in the population) in a much 
shorter time frame than such organisms could be generated using a classical targeted breeding 
15 approach. 

These libraries are generated relatively quickly (e.g., typically in less than three 
years for most plants of commercial interest, with six cycles or less of recursive breeding being 
sufficient to generate desired diversity). 

An additional benefit of these methods is that the resulting libraries provide 
20. organismal diversity in areas, such as agriculture, aquaculture, and animal husbandry, that are 
currently genetically homogeneous. 

Examples of these methods for several organisms are described below. 

1. Plants 

A population of plants, for example all of the different corn strains in a 
25 commercial seed/germplasm collection, are grown and the pollen from the entire population is 
harvested and pooled. This mixed pollen population is then used to "self* fertilize the same 
population. Self pollination is prevented, so that the fertilization is combinatorial. The cross 
results in all pairwise crosses possible within the population, and the resulting seeds result in 
many of the possible outcomes of each of these pairwise crosses. The seeds from the fertilized 
30 plants are then harvested, pooled, planted, and the pollen is again harvested, pooled, and used 
. to "self* fertilize the population. After only several generations, the resulting population is a 
very diverse combinatorial library of corn. The seeds from this library are harvested and 
screened for desirable traits, e.g., salt tolerance, growth rate, productivity, yield, disease 
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resistance, etc. Essentially any plant collection can be modified by this approach. Important 
commercial crops include both m'onocots and dicots. Monocots indude plants in the grass 
family (Gramineae), such as plants in the sub families Fetucoideae and Poacoideae, which 
together include several hundred genera including plants in the genera Agrostis, Phleum 
Dactylis, Sorgum, Setaria, Zea (e.g., corn), Oryza{e.g., rice), Triticum (e.g., wheat), Secale 
(e g., rye), Avena (e.g., oats), Hordeum (e.g., barley), Saccharum, Poa, Festuca, 
Stenotaphrum, Cynodon, Coix, the Olyreae, Phdreae and many others. Plants in the family 
Gramineae are a particularly preferred target plants for the methods of the invention. 
Additional preferred targets include other corrunercially importaht crops, e.g , from the ' 
families Compositae (the largest family of vascular plants, including at least X000 genera, 
including important commercial crops such as surilower), and or >ea family," - 

which includes several hundred genera, including iftany commercially ^ valuable crops such as 
pea, beans, lentil, peanut, yam bean, cowpeas, velvet beans, soybean, clover, alfalfa, lupine, 
vetch, lotus, sweet clover, wisteria, and sweetpea, Common crops applicable to the methods 
15 of the invention include Zea mays, rice, soybean, sorghum, wheat, oats, barley, millet, 
sunflower, and canola. 

This process can also be carried out using pollen from different species or more 
divergent strains ^e:g.; crossing the ancient grasses i with corn). Different plarit species can be 
forced to cross. Only a fe w plants from an initial cross would have to result in order to make 
i29lpfth^^ 

generate pollen and eggs, each of which would represent a different meiotic outcome from the - 
recombination of the two genomes. The pollen would be harvested and used to "self ' 
pollinate the original progeny. This process would then be carried out recursively. This would 
generate a large family shuffled library of two or more species, which could be subsequently 
25 . ''screened. • ' ••• ■■ . . ~ ... . ... 

The" above strategy is illustrated schematically in Figure 30. 

2. Fish ' ' 
The natural tendency of fish to lay their eggs outside of the body and to have a 
male cover those eggs with sperm provides another opportunity for a split pooled breeding 
strategy. The eggs from many different fish, e.g., salmon from different fisheries about the 
world, can be harvested, pooled, arid then fertilized with similarly collected and pooled 
salmon sperm. The fertilization will result in all of the possible pairwiselnatirigs of the starting 
population. The resulting progeny is then grown and again the sperm and eggs are harvested, 
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and pooled, with each egg and sperm representing a different meiotic outcome of the different 
crosses. The pooled sperm are then used to fertilize the pooled eggs and the process is carried 
out recursively. After several generations the resulting progeny can then be subjected to 
selections and screens for desired properties, such as size, disease resistance, etc. 
5 The above strategy is illustrated schematically in Figure 29. 

3. Animals 

The advent of in vitro fertilization and surrogate motherhood provides a means 
of whole genome shuffling in animals such as mammals. As with fish, the eggs and the sperm 
from a population, for example from all slaughter cows, are collected and pooled. The pooled 

10 eggs are then in vitro fertilized with the pooled sperm. The resulting embryos are then 

returned to surrogate mothers for development. As above, this process is repeated recursively 
until a large diverse population is generated that can be screened for desirable traits. 

A technically feasible approach would be similar to that used for plants. In this 
case, sperm from the males of the starting population is collected and pooled, and then this 

15 pooled sample is used to artificially inseminate multiple females from each of the starting 
populations. Only one (or a few) sperm would succeed in each animal, but these should be 
different for each fertilization. The process is reiterated by harvesting the sperm from all of the 
male progeny, pooling it, and using it to fertilize all of the female progeny. The process is 
carried out recursively for several generations to generate the organism library, which can then 

20 be screened. 

I RAPID EVOLUTION AS A PREDICTIVE TOOL 

Recursive sequence recombination can be used to simulate natural evolution of 
pathogenic microorganisms in response to exposure to a drug under test. Using recursive 
sequence recombination, evolution proceeds at a faster rate than in natural evolution. One 

25 measure of the rate of evolution is the number of cycles of recombination and screening 
required until the microorganism acquires a defined level of resistance to the drug. The 
information from this analysis is of value in comparing the relative merits of different drugs 
and in particular, in predicting their long term efficacy on repeated administration. 

The pathogenic microorganisms used in this analysis include the bacteria that 

30 are a common source of human infections, such as chlamydia, rickettsial bacteria, 

mycobacteria, staphylococci, streptocci, pneumonococci, meningococci and conococci, 
klebsiella, proteus, serratia, pseudomonas, legionella, diphtheria, salmonella, bacilli, 
cholera, tetanus, botulism, anthrax, plague, leptospirosis, and Lymes disease bacteria. 
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Evolution* effected by transforming an isolate of bacteria that is sensitive to a drug under test 
with a library of DNA fragments/ The fragments can be a mutated version of the genome of 
the bacteria being evolved. If the target of the drug is a known protein or nucleic acid a 
focused library containing variants of the corresponding gene can be used. Alternatively the 
hbrary can come from other kinds of bacteria, especially bacteria typically found inhabiting 
human tissues, thereby simulating the source material available for recombination / W v/vc The 
hbrary can also come from bacteria known to be resistant to the drug. After transformation 
and propagation of bacteria for an appropriate period to allow for recombination to occur and 
recombinant genes to be expressed, the bacteria are screened by exposing them to the drug 
under test and then collecting survivors. Surviving bacterial subject to further rounds of 
recombination. The subsequent round can be effected by a split and pool approach in which 
DNA from one subset of surviving bacteria is introduced into a second subset of bacteria 
Alternatively, a fresh library of DNA fragments can be introduced into surviving bacteria 
Subsequent round(s) of selection can be performed at increasing concentrations of drug, 
thereby increasing the stringency of selection. 

A similar strategy can be used to simulate viral acquisition of drug resistance 
The object is to identify drugs for which resistance can be acquired only slowly if at all The 
viruses to be evolved are those that cause infections m humans for which a^ 
effective drugs are available. Substrates for recombination can come from induced mutants 

(e g., nucleotide analogs which inhibit the reverse transcriptase gene of HTV), focused libraries 
containing variants of the target gene can be produced. Recombination of a viral genome with 
; a hbrary of fragments is usually performed in vitro: However, in situations in which the library 
offragments constitutes variants of viral genomes or fragments that can be encompassed in 
such genomes, recombination can also be performed in vivo, e.g., by transfecting cells with 
multiple substrate copies (see Section V). For screening, recombinant viral geriomesare 
introduced into host cells susceptible to infection by the virus and the cells are exposed to a 
dnig effective against the virus (initially at low concentration): fhe'cells can be spun to 
remove any noninfected virus: After a period of infection, progeny viruses can be collected 
from the culture medium, the progeny viruses being enriched for viruses that have acquired at 
least partial resistance to the drug. Alternatively, virally infected cells din be plated in a soft 
agar lawn and resistant viruses isolated from plaques. Plaque sfe provides SO me indication of 
the degree of viral resistance; .. :.: ■__ . - 
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Progeny viruses surviving screening are subject to additional rounds of 
recombination and screening at increased stringency until a predetermined level of drug 
resistance has been acquired. The predetermined level of drug resistance may reflect the 
maximum dosage of a drug practical to administer to a patient without intolerable side effects. 
5 The analysis is particularly valuable for investigating acquisition of resistance to.various 
combination of drugs, such as the growing list of approved anti-HTV drugs (e.g., AZT 7 ddl, 
ddC, d4T, TBBO 821 50, nevaripine, 3TC, crixivan and ritonavir). 

J. THE EVOLUTIONARY IMPORTANCE OF RECOMBINATION 
Strain improvement is the directed evolution of an organism to be more "fit" 

10 for a desired task. In nature, adaptation is facilitated by sexual recombination. Sexual 
recombination allows a population to exploit the genetic diversity within it, e.g., by 
consolidating useful mutations and discarding deleterious ones. In this way, adaptation and 
evolution can proceed in leaps. In the absence of a sexual cycle, members of a population 
must evolve independently by accumulating random mutations sequentially. Many useful 

15 mutations are lost while deleterious mutations can accumulate; Adaptation and evolution in 
this way proceeds slowly as compared to sexual evolution. 

As shown in Fig. 17, asexual evolution is a slow and inefficient process. 
Populations move as individuals rather than as groups. A diverse population is generated by 
the mutagenesis of a single parent resulting in a distribution of fit and unfit individuals. In the 

20 absence of a sexual cycle, each piece of genetic information of the surviving population 
remains in the individual mutants. Selection of the "fittest" results in many "fit" individuals 
being discarded along with the useful genetic information they carry. Asexual evolution 
proceeds one genetic event at a time and is thus limited by the intrinsic value of a single 
genetic event. Sexual evolution moves more quickly and efficiently. Mating within a 

25 population consolidates genetic information within the population and results in useful 

mutations being combined together. The combining of useful genetic information results in 
progeny that are much more fit than their parents. Sexual evolution thus proceeds much faster 
by multiple genetic events. 

Years of plant and animal breeding has demonstrated the power of employing 

30 sexual recombination to effect the rapid evolution of complex genomes towards a particular 
task. This general principle is further demonstrated by using DNA shuffling to recombine 
DNA molecules in vitro to accelerate the rate of directed molecular evolution. The strain 
improvement efforts of the fermentation industry rely on the directed evolution of 
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nacroorganisms by sequential ^dbh, Mutagenesis . Inception of recombination into this 

" ,teratiVePrOWSS ^ 
the profitability of current fomentation - 

• products. ; :v •- " •- - •": '■ • •• '■ ••* •■ >':■•'-■'■ 

DNA shuffling includes the recursive recombination of DNA sequences A 4 
significant difference between DNA shuffling and natural sexual recombination is that DNA 
shuffling can produce DNA sequences origmating,from multiple parental sequences while 

,. : Asshown in figure 25, the rate of evolution is in part limited by the number of 

useful mutations that a member of a population can accumulate between selection events In 
sequential random mutagenesis, useful mutations are accumulated one per selection event 
15 Many useful mutations are discarded each cycle in favor of the best performer, and neutral or 
deleterious mutations which survive are as difficult to lose as they were to gain and thus 
accumulate. In sexual evolution pairwise recombination allows mutations from two different 
. ; parents to segregate and recombine in different combinations: Useful mutations can 
.... accumulate, and deleterious mutations canbelost. Poolwsie recombination, such as that 



se 

: can 



25 



mutations from many parents to consolidate into a single progeny. Thus poolwis 
recombination provides a means for. increasing the number of useful mutations that _ 
accumulate each selection event. The graph in Fig. 25 shows a plot of the potential number of 
mutations an individual can accumulate^ each of these processes. Recombination is 
exponentially superior to sequential random mutagenesis, and this advantage increases 
exponentially with the number of parents that can recombine. Sexual recombination is thus 
more conservative. In nature, the pairwise nature of sexual recombination may provide 
important stability within a population by impeding the large changes in DNA-sequence that 
can result from poolwise recombination. For the purposes, of directed evolution, however, 
3 0 P?olwise recombination is more, efficient. , 

The potential diversity that can be generated from a popular 
result of poolwise recombination as. compared to that 

Further^ poolwise recombination enables the combining of multiple beneficial mutations 
onginating from multiple parental sequences. 
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To demonstrate the importance of poolwise recombination vs pairwise 
recombination in the generation of molecular diversity consider the breeding often 
independent DNA sequences each containing only one unique mutation. There are 2 10 = 1024 
different combinations of those ten mutations ranging from a single sequence having no 
5 mutations (the consensus) to that having all ten mutations. If this pool were recombined 
together by pairwise recombination, a population containing the consensus, the parents, and 
the 45 different combinations of any two of the mutations would result in 56 or ca. 5% of the 
possible 1024 mutant combinations. Alternatively, if the pool were recombined together in a 
poolwise fashion, all 1 024 would be theoretically generated, resulting in an approximately 20 
10 fold increase in library diversity. When looking for a unique solution to a problem in 

molecular evolution, the more complex the library, the more complex the possible solution. 
Indeed, the most fit member of a shuffled library often contains several mutations originating 
from several independent starting sequences. 

1. DNA Shuffling Provides Recursive Pairwise Recombination 
15 In vitro DNA shuffling results in the efficient production of combinatorial 

genetic libraries by catalyzing the recombination of multiple DNA sequences. While the result 

of DNA shuffling is a population representing the poolwise recombination of multiple 

sequences, the process does not rely on the recombination of multiple DNA sequences 

simultaneously, but rather on their recursive pairwise recombination. The assembly of 

20 complete genes from a mixed pool of small gene fragments requires multiple annealing and 
elongation cycles, the thermal cycles of the primerless PGR reaction. During each thermal 
cycle many pairs of fragments anneal and are extended to form a combinatorial population of 
larger chimeric DNA fragments. After the first cycle of reassembly, chimeric fragments 
contain sequence originating from predominantly two different parent genes, with all possible 

25 pairs of "parental" sequence theoretically represented. This is similar to the result of a single 
sexual cycle within a population. During the second cycle, these chimeric fragments anneal 
with each other or with other small fragments, resulting in chimeras originating from up to 
four of the different starting sequences, again with all possible combinations of the four ,, 
parental sequences theoretically represented. This second cycle is analogous to the entire 

30 population resulting from a single sexual cross, both parents and offspring, inbreeding. 

Further, cycles result in chimeras originating from 8, 16, 32, etc parental 
sequences and are analogous to further inbreedings of the preceding population. This could be 
considered similar to the diversity generated from a small population of birds that are isolated 
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on a,, island, breeding with ^ch othA 
of "poolwise" recombination, but the 

reason, the DNA molecules generated from \in -itroYmk^^^^^ ^ 
the starting "parental" sequences, but ^^^^^^ :< ^ m ^^ 
5 thermal cycles) grand progeny of the starting "ancestor" molecules 

L. FERMF.NTATTON 
The fermentation of to 

the oldest and most sophisticated application of biocatalysis. Industrial microorganisms effect 

reactor and in so doing catalyze a multi-billion dollar industry. Fermentation products range 
from fine and commodity chemicals such as ethanol, Iacti^ 
mgh value smallmbleoule pharmaceutic 

See, e.g., McCoy (1998) C&EN 13-19) for an ^ imfoduction to biocatalysis. 
-_ • :*• VSuc^m bringing the^produ^ 

market depends on continuous impro vement of the whole cell biocatalysts. Improvements 
mclude increased yield of desired products, removal of unwanted co-metabolites, improved 
ut,hzat,on of inexpensive carbon and nitrogen sources, and adaptation to fermenter conditions 
' ' * Pr0aUCti0,| • ° f a Production of a secondary metabolite ' 

"J mCTeaSCd t0lCranCe t0 addic ^sed tolerance fo basic conditions increased 

tolerance to high or low temperatures. Shortcomings in any of these areas can result in high 
manufaauring costs, inability to capture or mamtain market share, and faUure of brm^ 
promising products to market. For this reason, the fermentation industry invests significant 
financal and personnel resources in the improvement of production strains. 
25 - , Current strategies for strain improvement rely on the empirical a^ 

modification of fermenter conditions and genetic manipulation of the producing organism 
While advances in the molecular biology of established industrial organisms have been made 
rational metabolic engineering is information intensive and is not broadly applicable to less ' 
characterized industrial strains. The niost widely practiced strategy ^ for strain improvement 
employs random mutagenesis of the producing strain and screening for mutants having 
unproved properties. For mature strains, those subjected to many rounds of improvement 
these efforts routinely provide *10% increase in product titre per year: Although effective 
thts class,c strategy is slow, laborious, and expensive. Technological advances in this area are . 
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aimed at automation and increasing sample screening throughput in hopes of reducing the cost 
of strain improvement. However, the real technical barrier resides in the intrinsic limitation of 
single mutations to effect significant strain improvement. The methods herein overcome this 
limitation and provide access to multiple useful mutations per cycle which can be used to 
5 complement automation technologies and catalyze strain improvement processes. 

The methods herein allow biocatalysts to be improved at a faster pace than 
conventional methods. Whole genome shuffling can at least double the rate of strain 
improvement for microorganisms used in fermentation as compared to traditional methods. 
This provides for a relative decrease in the cost of fermentation processes. New products can 

10 enter the market sooner, producers can increase profits as well as market share, and 

consumers gain access to more products of higher quality and at lower prices. Further, 
increased efficiency of production processes translates to less waste production and more 
frugal use of resources. Whole genome shuffling provides a means of accumulating multiple 
useful mutation per cycle and thus eliminate the inherent limitation of current strain 

15 improvement programs (SIPs). 

DNA shuffling provides recursive mutagenesis, recombination, and selection of 
DNA sequences. A key difference between DNA shuffling-mediated recombination and 
natural sexual recombination is that DNA shuffling effects both the pairwise (two parents) and 
the poolwise (multiple parents) recombination of parent molecules, as described supra, 

20 Natural recombination is more conservative and is limited to pairwise recombination. In 

nature, pairwise recombination provides stability within a population by preventing large leaps 
in sequences or genomic structure that can result from poolwise recombination. However, for 
the purposes of directed evolution, poolwise recombination is appealing since the beneficial 
mutations of multiple parents can he combined during a single cross to produce a superior 

25 offspring. Poolwise recombination is analogous to the crossbreeding of inbred strains in 
classic strain improvement, except that the crosses occur between many strains at once. In 
essence, poolwise recombination is a sequence of events that effects the recombination of a 
population of nucleic acid sequences that results in the generation of new nucleic acids that 
contains genetic information from more than two of the original nucleic acids. The power of in 

30 vitro DNA shuffling is that large combinatorial libraries can be generated from a small pool of 
DNA fragments reassembled by recursive pairwise annealing and extension reactions, 
Ratings." Many of the in vivo recombination formats described (such as plasmid-plasmid, 
plasmid-chromosome, phage-phage, phage-chromosome, phage-plasmid, conjugal DNA- 

47 

BNSDOCID: <WO 00O4 1 90A1 JA> 



10 



15 



WO 00/04190 nr „ 

PCT/US99/15972 

chromosome, exogenous DNA-cfiromdsome, chromosome-chromosome, with the DNA being 
introduced into the cell by natural and non-natural competence, transduction, transfection, 
conjugation, protoplast fusion, etc.) result primarily in theipairwise recombination of two 
DNA molecules: Thus, these formats when executed for only a single cycle of recombination 
are inherently limited in their potential to generate moiecular diversity. To generate the level 
of diversity obtained by in vitro DNA Shuffling methods, pairwise mating formats must be 
carried out recursively, i.e for many generations, prior to screening for improved sequences. 
Thus a pool of DNA sequences, such as four ihaeperident chromosomes, must be recombined, 
for example by protoplast fusion, and the progeny of that recombination (each representing a 
unique outcome of the pairwise mating) must then be pooled, without selection, arid then 
recombined again, and again, and again. This process should be repeated for a sufficient 
number of cycles to result in progeny having the desired complexity. Only once sufficient 
diversity has been generated, should the resulting population be screened for new and 
improved sequences/ - ■ 

There are a few general methods for effecting efficient recombination in 
prokaryotes. Bacteria have no known sexual cycle perse, but there are natural mechanisms by 
which glomes of these organisms undergo recombination. These mechanisms include 
natural competence, phage-mediated transduction, and cell-cell conjugation. Bacteria that are 
naturally competent are capable of efficiently taking up naked DNA from the environment. If 

genetic exchange Bacillus subtilis, the primary production organism of the enzyme industry, 
is known for the efficiency with which it carries out this process 

In generalized transduction, a bacteriophage mediates genetic exchange. A 
transducing phage will often package headfulls of the host genome. These phage can infect a 
25 hew host and deliver a fragment of the former host genome which is frequently integrated via 
homologous recombination. Cells can also transfer DNA between themselves by conjugation. 
Cells containing the appropriate mating factors transfer episomes as well as entire 
chromosomes to an appropriate acceptor cell where it can recombine with the acceptor 
genome. Conjugation resembles sexual recombination for microbes and can be intraspecific, 
interspecific, and intergeneric. For example, an efficient means of transforming Streptomyces 
sp., a genera responsible for producing bany commercial antibiotics, is by the conjugal 
transfer of plasmids from Escherichia coli. : ; ^ - 
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For many industrial microorganisms, knowledge of competence, transducing 
phage, or fertility factors is lacking. Protoplast fusion has been developed as a versatile and 
general alternative to these natural methods of recombination. Protoplasts are prepared by 
removing the cell wall by treating cells with lytic enzymes in the presence of osmotic 
5 stabilizers. In the presence of a fusogenic agent, such as polyethylene glycol (PEG), 

protoplasts are induced to fuse and form transient hybrids or "fusants " During this hybrid 
state, genetic recombination occurs at high frequency allowing the genomes to reassort. The 
final crucial step is the successful segregation and regeneration of viable cells from the fused 
protoplasts. Protoplast fusion can be intraspecific, interspecific, and intergeneric and has been 

10 applied to both prokaiyotes and eukaryotes. In addition, it is possible to fuse more than two 
cells, thus providing a mechanism for effecting poolwise recombination. While no fertility 
factors, transducing phages or competency development is needed for protoplast fusion, a 
method for the formation, fusing, and regeneration of protoplasts is typically optimized for 
each organism. Protoplast fusion as applied to poolwise recombination is described in more 

15 detail, supra. 

One key to SIP is having an assay that can be dependably used to identify a few 
mutants out of thousands that have subtle increases in product yield. The limiting factor in 
many assay formats is the uniformity of cell growth. This variation is the source of .baseline 
variability in subsequent assays. Inoculum size and culture environment 

20 (temperature/humidity) are sources of cell growth variation. Automation of all aspects of 

establishing initial cultures and state-of-the-art temperature and humidity controlled incubators 
are useful in reducing variability. 

Mutant cells or spores are separated on solid media to produce individual 
sporulating colonies. Using an automated colony picker (Q-bot, Genetix, U.K.), colonies are 

25 identified, picked, and 10,000 different mutants inoculated into 96 well microtitre dishes 
containing two 3 mm glass balls/well. The Q-bot does not pick an entire colony but rather 
inserts a pin through the center of the colony and exits with a small sampling of cells (or 
mycelia) and spores. The time the pin is in the colony, the number of dips to inoculate the 
culture medium, and the time the pin is in that medium each effect inoculum size, and each can 

30 be controlled and optimized. The uniform process of the Q-bot decreases human handling 
error and increases the rate of establishing cultures (roughly 10,000/4, hours). These cultures 
are then shaken in a temperature and humidity controlled incubator. The glass balls act to 
promote uniform aeration of cells and the dispersal of mycelial fragments similar to the blades 
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of a fermenter. An embodiment of this procedure is further illustrated in Fig. 28, including an 
integrated system for the assay: 

1 . Prescreen ./ 
The ability to detect a subtle increase in the performance of a mutant over that 
of a parent strain relies on the sensitivity of the assay . The chance of finding the organisms 
having an improvement is increased by the number of individual mutants that can be screened 
by the assay. To increase the chances of identifying a pool of sufficient size a prescreen that 
' increases the number of mutants processed by 10-fold can be used. The goal of the primary 
screen will be to quickly identify mutants having equal or better product litres than the parent 
10 strain(s) and to move only these mutants forward to liquid cell culture. 

The primary screen is an agar plate screen is analyzed by the Q-bot colony 
picker. Although assays can be fundamentally different, many result, e.g., in the production of 
colony halos. For example, antibiotic production is assayed on plates using an overlay of a 
sensitive indicator strain, such as B. subtilis. Antibiotic production is typically assayed as a 
zone of clearing (inhibited growth of the indicator organism) around the producing organism. 
Similarly, enzyme production can be assayed on plates containing the enzyme substrate, with 
activity being detected as a zone of substrate modification around the producing colony. 
Product titre is correlated with the ratio of halo area, to colony area. 

The Q-bot or other automated system is instructed to only pick colonies having 

the plate prescreen. This increases the number of improved clones in the secondary assay and 
eliminates the wasted effort of screening knock-out and low producers. This improves the "hit 
rate" of the secondary assay. 

M. PROM OTION OF GENETIC EXCHANGE 
25 1 . General . * " 

Some methods of the invention effect recombination of cellular DNA by 
propagating cells under conditions inducing exchange of DNA between cells. DNA exchange 
can.be promoted by generally applicable methods such as electroporation, biolistics, cell 
fusion, or in some instances, by conjugation, transduction, or agrobacterium mediated transfer 
30 and meiosis. For example, Agrobacterium can transform S. c'erevisiae with T-DNA, which is 
incorporated into the yeast genome by both homologous recombination and a gap repair 
mechanism. (Piers et al., Proc. Natl. Acad Set. 93(4), 1613-8 (1996)). 
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In some methods, initial diversity between cells (i.e., before genome exchange) 
is induced by chemical or radiation-induced mutagenesis of a progenitor cell type, optionally 
followed by screening for a desired phenotype. In other methods, diversity is natural as where 
cells are obtained from different individuals, strains or species. \ 
5 In some shuffling methods, induced exchange of DNA is used as the sole means 

of effecting recombination in each cycle of recombination. In other methods, induced 
exchange is used in combination with natural sexual recombination of an organism. In other 
methods, induced exchange and/or natural sexual recombination are used in combination with 
the introduction of a fragment library. Such a fragment library can be a whole genome, a 

10 whole chromosome, a group of functionally or genetically linked genes, a plasmid, a cosmid, a 
mitochondrial genome, a viral genome (replicative and nonreplicative) or specific or random 
fragments of any of these. The DNA can be linked to a vector or can be in free form. Some 
vectors contain sequences promoting homologous or nonhomologous, recombination with the 
host genome. Some fragments contain double stranded breaks such as caused by shearing 

15 with glass beads, sonication, or chemical or enzymatic fragmentation, to stimulate 
recombination. 

In each case, DNA can be exchanged between cells after which it can undergo 
recombination to form hybrid genomes. Generally, cells are recursively subject to 
recombination to increase the diversity of the population prior to screening. Cells bearing 

20 hybrid genomes, e.g., generated after at least one, and usually several cycles of recombination 
are screened for a desired phenotype, and cells having this phenotype are isolated. These cells 
can additionally form starting materials for additional cycles of recombination in a recursive 
recombination/selection scheme. 

One means of promoting exchange of DNA between cells is by fusion of cells, 

25 such as by protoplast fusion. A protoplast results from the removal from a cell of its cell wall, 
leaving a membrane-bound cell that depends on an isotonic or hypertonic medium for 
maintaining its integrity. If the cell wall is partially removed, the resulting cell is strictly 
referred to as a spheroplast and if it is completely removed, as a protoplast. However, here 
the term protoplast includes spheroplasts unless otherwise indicated, 

30 Protoplast fusion is described by Shaffher et al^ Proc. NatL Acad ScL USA 77, 

2163 (1980) and other exemplary procedures are described by- Yoakum et al., US 4,608,339, 
Takahashi et al., US 4,677,066 and Sambrooke et al., at Ch. 16. Protoplast fusion has been 
reported between strains, species, and genera (e.g., yeast and chicken erythrocyte). 
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Protoplasts can be prepared for both bacterial and eukaryotic cells, including 
manunalian cells and plant wUs, by several ineansinduding c 

wafls. For example, cell walls can be stripped by digestion with a cell wall degrading enzyme 
such as lysozyme in a 10-20% sucrose, 50 mM EDTA buffer. Conversion of cells to spherical 
protoplasts can be monitored by ph^ 

by propagation of cells in. media supplemented with an inhibitor of cell wall synthesis, or use of 
mutant strains lacking ^capacity for cell wall formation. Preferably, eukaryotic cells are 
synchronized in Gl phase by arrest with inhibitors such as a-factor, K . lactis killer toxin, 
leflonamide and adenylate cyclase inhibitors. Optionally, some but not all, protoplasts to be 
fused can be killed and/or have their DNA fragmented by treatment with ultraviolet irradiation, 
hydroxylamine or cupferon (Reeves et a?., FEMS Microbiol. Lett. 99, 193 198 (1992)) In 
this situation, killed protoplasts are referred to as donors, and viable protoplasts as acceptors. 
Using dead donors cells can be advantageous in subsequently recognizing fused cells with 
hybrid genomes, as described below. Further* breaking up DNA in donor cells is 
advantageous for stimulating recombination with acceptor DNA. Optionally, acceptor and/or 
fused cells can also be briefly, but nonlethaUy, exposed to UV irradiation further to stimulate 
recombination. 

Once formed, protoplasts can be stabilized in a variety of osmolytes and 
compounds such as sodium chloride, potassium chloride, sodium phosphate^potassium . 
^hosphaiersucrosersor^ 
reducing agent, and osmotic stabUizer can be optimized for different cell types. Protoplasts 
can be induced to fuse by treatment with a chemical such as PEG, calcium chloride or calcium 
propionate or electrofusion (Tsoneva, Acta Microbiologica Bulgaria 24, 53-59 (1 989)). A 
method of cell fusion employing electric fields has also been described. See Chang US, 
4,970, i 54. Conditions can be optimized for different strains." ' 

The fused cells are heterokaryons containing genomes from two or more 
component protoplasts. Fused cells can be enriched from unfused parental cells by sucrose 
gradient sedimentation or cell sorting. The two nuclei in the heterokaryons can fuse 
(karyogamy) and homologous recombination can occur between the genomes The 
chromosomes can also segregate asymmetrically resulting in regenerated protoplasts that have 
lost or gained whole chromosomes. The frequency of recombination can be increased by 
treatnierit with ultraviolet irradiation or by use of strains' bverexpressing recA or other 
recombination genes, or the yeast rad genes, and cognate variants thereof in other species, or 
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by the inhibition of gene products of MutS, MutL, or MuiD. Overexpression can be either the 
result of introduction of exogenous recombination genes or the result of selecting strains, 
which as a result of natural variation or induced mutation, overexpress endogenous 
recombination genes. The fused protoplasts are propagated under conditions allowing 
5 regeneration of cell walls, recombination and segregation of recombinant genomes into 

progeny cells from the heterokaryon and expression of recombinant genes. This process can 
be reiteratively repeated to increase the diversity of any set of protoplasts or cells. After, or 
occasionally before or during, recovery of fused cells, the cells are screened or selected for 
evolution toward a desired property. 

10 Thereafter a subsequent round of recombination can be performed by preparing 

protoplasts from the cells surviving selection/screening in a previous round. The protoplasts 
are fused, recombination occurs in fused protoplasts, and cells are regenerated from the fused 
protoplasts. This process can again be reiteratively repeated to increase the diversity of the 
starting population. Protoplasts, regenerated or regenerating cells are subject to further 

15 selection or screening. 

Subsequent rounds of recombination can be performed on a split pool basis 4s 
described above. That is, a first subpopulation of cells surviving selection/screening from a 
previous round are used for protoplast formation. A second subpopulation of cells surviving 
selection/screening from a previous round are used as a source for DNA library preparation. 

20 The DNA library from the second subpopulation of cells is then transformed into the 
protoplasts from the first subpopulation. The library undergoes recombination with the 
genomes of the protoplasts to form recombinant genomes. This process can be repeated 
several times in the absence of a selection event to increase the diversity of the cell population. 
Cells are regenerated from protoplasts, and selection/screening is applied to regenerating or 

25 regenerated cells. In a further variation, a fresh library of nucleic acid fragments is introduced 
into protoplasts surviving selection/screening from a previous round. 

An exemplary format for shuffling using protoplast fusion is shown in Fig. 5. 
The figure shows the following steps: protoplast formation of donor and recipient strains, 
heterokaryon formation, kaiyogamy, recombination, and segregation of recombinant genomes 

30 into separate cells. Optionally, the recombinant genomes, if having a sexual cycle, can 

undergo further recombination with each other as a result of meiosis and mating. Recursive 
cycles of protoplast fusion, or recursive mating/meiosis is often used to increase the diversity 
of a cell population. After achieving a sufficiently diverse population via one of these forms of 
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recombination, cells are screened or selected for a desired property. Cells surviving 
selection/screening can then used as the starting materials m a further cycle of protoplasting or 
other recombination methods as noted therein, 

. . 2. Selecti on For Hy br id Strains 

The invention provides selection strategies to identify cells formed by fusion of 
components from parental cells from two or more distinct subpopulatioiu. Selection for 
hybrid cells is usually performed before selecting or screening for cells that have evolved (as a 
result of genetic exchange) to acquisition of a desired property. A basic premise of most such 
selection schemes is :that two initial subpbpulations hkve tWo'distinct'«uirk m . . .cdk with 
1 0 hybrid genomes can thus be identified by selection for both markers. 

In one such scheme, at least one subpopulation of cells bears a selective marker 
attached to its" cell iriembrane. Examples of suitable membr^ 

fluorescein and rhodamine. The markers can be linked to amide or thiol groups or through 
more specific derivation chemistries, such as iodo-acetates, iodoacetamides, maleimides 
For example, a marker can be attached as follows. Cells or protoplasts are washed with a 
buffer (e.g., PBS), which does not interfere with the chemical coupling of a chemically active 
ligand which reacts with amino groups of lysines or N-terrninal aminogroups , of membrane 
proteins. The ligand is either amine reactive itself (e^Jisothiocyanates, succinimidyl esters, 
-^S^^^^^^M^ heterobifunctional linker (e.g. EMCS, SIAB, SPDP, ' 

derivatized magnetic beads or other capturing solid supports. For example, the ligand can be 

succinimidyl activated biotin (Molecular Probes Inc!: B-1606, B-2603; S-1515, S-1582). This 

linker is reacted with aminogroups of proteins residing in and on the surface of a cell. The 

cells are then washed to remove excess labelling agent before contacting with ceUs from the 

second subpopulation bearing a second selective marker. 

The second subpopulation of cells can also bear a membrane marker, albeit a 

different membrane marker from the first subpopulation. Alternatively, the second ' 

subpopulation can bear a genetic marker. The genetic marker can confer a selective property 

suchasdrug resistance or a screenable property, such as expression of green fluorescent 
; 0 protein'."' ."■■-<■ - - • • ■■• • - .- .. v . 

After fusion of first and s^ 
screened or selected for the presence* of makers on both parental subpopulations. For 
example, fosants are enriched for one population by adsorbtion to specific beads and these are 
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then sorted by FACS for those expressing a maricer. Cells surviving both screens for both 
markers are those having undergone protoplast fusion, and are therefore more likely to have 
recombined genomes. Usually, the markers are screened or selected separately. Membrane- 
bound markers, such as biotin, can be screened by affinity enrichment for the cell membrane 
5 marker (e.g., by panning fused cells on an affinity matrix). For example, for a biotin 
membrane label, cells can be affinity purified using streptavidin-coated magnetic beads 
(Dynal). These beads are washed several times to remove the non-fused host cells. 
Alternatively, cells can be panned against an antibody to the membrane marker. In a further 
variation, if the membrane marker is fluorescent, cells bearing the marker can be identified by 

10 FACS. Screens for genetic markers depend on the nature of the markers, and include capacity 
to grow on drug-treated media or FACS selection for green fluorescent protein. If first and 
second cell populations have fluorescent markers of different wavelengths, both markers can 
be screened simultaneously by FACS sorting. 

In a further selection scheme for hybrid cells, first and second populations of 

15 cells to be fused express different subunits of a heteromultimeric enzyme. Usually, the 

heteromultimeric enzyme has two different subunits, but heteromultimeric enzymes having 
three, four or more different subunits can be used. If an enzyme has more than two different 
subunits, each subunit can be expressed in a different subpopulation of cells (e.g., three 
subunits in three subpopulations), or more than one subunit can be expressed in the same 

20 subpopulation of cells (e.g., one subunit in one subpopulation, two subunits in a second 

subpopulation). In the case where more than two subunits are used, selection for the poolwise 
recombination of more than two protoplasts can be achieved. 

Hybrid cells representing a combination of genomes of first, second or more 
subpopulation component cells can then be recognized by an assay for intact enzyme. Such an 

25 assay can be a binding assay, but is more typically a functional assay (e.g., capacity to 

metabolize a substrate of the enzyme). Enzymatic activity can be detected for example by 
processing of a substrate to a product with a fluorescent or otherwise easily detectable 
absorbance or emission spectrum. The individual subunits of a heteromultimeric enzyme used 
in such an assay preferably have no enzymic activity in dissociated form, or at least have 

30 significantly less activity in dissociated form than associated form. Preferably, the cells used 
for fusion lack an endogenous form of the heteromultimeric enzyme, or at least have 
significantly less endogenous activity than results from heteromultimeric enzyme formed by 
fusion of cells. 
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Penicillin acylase ; enzymes, cephalosporin acylase and penicillin acyltransferase 
are examples of suitable heteromultimeric enzyme* These enzymes are encoded by a single 
gene, which is translated as a proenzyme and cleaved by posttranslational autocatalytic 
proteolysis to remove a spacer endopeptide and generate two subunits, which associate to 
form the active heterodiiheric enzyme. Neither siibunit is active in the absence of the other 
subunit. However, activitycan be ^constituted if ^ 

in the same cell by co-transformation. Other enzymes that can be used have subunits that are 
encoded by distinct genes (e.g., faoA and faoB genes encode 3 -oxoacyl-Co A thiolase of 
Pseudonmonasfrdgi (Biochem.J 328, 815-820(1997)) 

An exemplary enzyme is penicillin G acylase from Escherichia coli, which has 
two subunits encoded by a singlegene. Fragments of the gene encoding the two subunits 
operably linked to appropriate expression regulation sequences are transfected into first and 
second subpopulations of cells, which lack endogenous penicillin acylase activity. A cell 
■ formed by fusion of component cells from the first arid second subpopulations expresses the 
two subunits, which assemble to form functional enzyme, e.g., penicillin acylase. Fused cells 
can then be selected on agar plates containing penicillin G, which is degraded by penicillin 
acylase! - 

In another variation, fused cells are identified by complementation of 
auxotrophic mutants. Parental subpopulaW known auxotrophic . 

^mirtations^AitemativelyrauxoTrbpK 
generated spontaneously by exposure to a mutagenic agent. Cells with auxotrophic mutations 
are selected by replica plating on minimal and complete media. Lesions resulting in 
auxotrophy are expected to be scattered throughout the genome, in genes for amino acid, 
nucleotide, and vitamin bibsynthetic pathways. After fusion of parental cells, cells resulting 
from fusion can be identified by their capacity to grow on minimal media. These cells can then 
be screened or selected for evolution toward a desired property. Further steps of mutagenesis 
generating fresh auxotrophic mutations can be incorporated in subsequent cycles of 
recombination and screening/selection. 

In variations of the above method, de novo generation of auxotrophic 
mutations in each round of shuffling can be avoided by reusing the same auxotrophs! For 
example, auxotrophs can be generated by transposon mutagenesis using a transposon bearing 
selective marker. Auxbtrophs are ; identified by a screen such as ^ replica plating. Auxotrophs 
are pooled, and a generalized transducing phage lysate is prepared by growth of phage on a 
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population of auxotrophic cells. A separate population of auxtrophic cells is subjected to 
genetic exchange, and complementation is used to selected cells that have undergone genetic 
exchange and recombination. These cells are then screened or selected for acquisition of a 
desired property. Cells surviving screening or selection then have auxotrophic markers 
5 regenerated by introduction of the transducing transposon library . The newly generated 
auxotrophic cells can then be subject to further genetic exchange and screening/selection. 

In a further variation, auxotrophic mutations are generated by homologous 
recombination with a targeting vector comprising a selective marker flanked by regions of 
homology with a biosynthetic region of the genome of cells to be evolved. Recombination 

10 between the vector and the genome inserts the positive selection marker into the genome 
causing an auxotrophic mutation. The vector is in linear form before introduction of cells. 
Optionally, the frequency of introduction of the vector can be increased by capping its ends 
with self-complementarity oligonucleotides annealed in a hair pin formation. Genetic 
exchange and screening/selection proceed as described above. In each round, targeting 

1 5 vectors are reintroduced regenerating the same population of auxotrophic markers: 

In another variation, fused cells are identified by screening for a genomic 
marker present on one subpopulation of parental cells and an episomal marker present on a 
second subpopulation of cells. For example, a first subpopulation of yeast containing 
mitochondria can be used to complement a second subpopulation of yeast having a petite 

20 phenotype (i.e., lacking mitochondria). 

In a further variation, genetic exchange is performed between two 
subpopulations of cells, one of which is dead. Cells are preferably killed by brief exposure to 
DNA fragmenting agents such as hydroxylamine, cupferon, or irradiation. Viable cells are 
then screened for a marker present on the dead parental subpopulation. 

25 3. Liposome-mediated transfers 

In the methods noted above, in which nucleic acid fragment libraries are 

introduced into protoplasts, the nucleic acids are sometimes encapsulated in liposomes to 

facilitate uptake by protoplasts. Lipsome-mediated uptake of DNA by protoplasts is described 

in Redford etal.,Mo/. Gen. Genet. 184, 567-569 (1981). Liposomes can efficiently deliver 

30 large volumes of DNA to protoplasts (see Deshayes et al., EMBOJ. 4, 273 1-2737 (1985)). 

See also, Philippot and Schuber (eds) (1995) Liposomes as Tools in Basic Research and 

Industry CRC press, Boca Raton, e.g., Chapter 9, Remy et al <4 Gene Transfer with Cationic 

Amphiphiles." Further, the DNA can be delivered as linear fragments, which are often more 
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recombindgenic that whole genomes: In some methods, figments are mutated prior to 
encapsulation in liposomes. In some me^ 
• homologs, or nucleases <e f ., resection <^on«de^) before encapstilation in liposomes to 
promote recombination Alternatively/protoplasts can be treated with lethal doses of nicking 
5 reagents and then fused Cells which survive are those which are repaired by recombination 
vWth other genomic fr^ 

recombinant (and therefore desirably diverse) protoplasts. 

4. Shuffli ng filamentous fung i 
Filamentous fungi are part icularly suited to performing the shuffling methods 
10 described above. Filamentous fungi are divided mto four mam 

structures for sexual reproduction: Phycomycetes, Ascomycetes, Basidiomycetes and the 
= Fungi Imperfecta Phycomycetes (e^, Rhizopus, Mucor) form sexual spores in sporangium 
The spores can be uni or multinucleate and often lack septated hyphae (coenocytic). 
Ascomycetes (e.g., Aspergillus, Neurospora, Penicillum) produce sexual spores in an ascus as 
15 a result of meiotic division! Asci typically contain 4 meiotic products, but some contain 8 as a 
result of additional mitotic division. Basidiomycetes include mushrooms, and smuts and form 
sexual spores on the surface of a basidium. In holobasidiomycetes, such as mushrooms, the 
tosidium is" undivided! ' to ^m/Aos^o^ey; such as nits (W^/iatoXandW fengi 
(C/y//'/a^na/ej), the basidium is divided. Fungi imperfecti, which include niost human 

Fungi can reproduce by asexual, sexual or parasexual means. Asexual 
reproduction, involves vegetative growth of mycelia, nuclear division and cell division without 
involvement of gametes and without nuclear fusion. Cell division can occur by spoliation, 
budding or fragmentation of hyphae^ 

Sexual reproduction provides a mechanism for shuffling genetic material 
between cells. A sexual reproductive cycle is characterized by an alteration of a haploid phase 
and a diploid phase. Diploidy occurs when two Opioid gamete nuclei fuse (karyogamy). The 
gamete nuclei can come from the same parental strains (self-fertile), such as in the homothallic 
fungi. IniieterothalKcfui^thepaiei^ 

A diploid cell converts to haploidy via meiosis, which essentially consists of 
two divisions of the nucleus accompanied by one division of the chromosomes. The products 
of one meiosis are a tetrad (4 Haploid nuclei). In some cases, a mitotic division occurs after 
meiosis, giving rise to eight product cells. The arrangement of the resultant cells (usually 
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enclosed in spores) resembles that of the parental strains. The length of the haploid and 
diploid stages differs in various fungi: for example, the Basidiomycetes and many of the 
Ascomycetes have a mostly hapolid life cycle (that is, meiosis occurs immediately after 
kaiyogamy), whereas others (e.g., Saccharomyces cerevisiae) are diploid for most of their life 
5 cycle (kaiyogamy occurs soon after meiosis). Sexual reproduction can occur between cells in 
the same strain (selfing) or between cells from different strains (outcrossing). 

Sexual dimorphism (dioecism) is the separate production of male and female 
organs on different mycelia. This is a rare phenomenon among the fungi, although a few 
examples are known. Heterothallism (one locus-two alleles) allows for outcrossing between 
10 crosscompatable strains which are self-incompatable. The simplest form is the two allele-one 
locus system of mating types/factors, illustrated by the following organisms: 
A and a in Neurospora; a and a in Saccharomyces; plus and minus in Schizzosaccharomyces 
and Zygomycetes; a\ and 02 in Ustilago. 

Multiple-allelomorph heterothallism is exhibited by some of the higher 
15 Basidiomycetes (e.g. Gasteromycetes and Hymenomycetes), which are heterothallic and have 
several mating types determined by multiple alleles. Heterothallism in these organisms is either 
bipolar with one mating type factor, or tetrapolar with two unlinked factors, A and B, Stable, 
fertile heterokaryon formation depends on the presence of different A factors and, in the case 
of tetrapolar organisms, of different B factors as well. This system is effective in the 
20 promotion of outbreeding and the prevention of self-breeding. The number of different mating 
factors may be very large (i.e. thousands) (Kothe, FEMS Microbiol Rev. 1 8, 65-87 (1 996)), 
and non-parental mating factors may arise by recombination. 

Parasexual reproduction provides a further means for shuffling genetic material 
between cells. This process allows recombination of parental DNA without involvement of 
25 mating types or gametes. Parasexual fusion occurs by hyphal fusion giving rise to a common 
cytoplasm containing different nuclei. The two nuclei can divide independently in the resulting 
heterokaryon but occasionally fuse. Fusion is followed by haploidization, which can involve 
loss of chromosomes and mitotic crossing over between homolgous chromosomes. Protoplast 
fusion is a form of parasexual reproduction. 
30 Within the above four classes, fungi are also classified by vegetative 

compatibility group. Fungi within a vegetative compatibility group can form heterokaryons 
with each other. Thus, for exchange of genetic material between different strains of fungi, the 
fungi are usually prepared from the same vegetative compatibility group. However, some 
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genetic exchange can occur between fungi from different incompatibility groups as a result of 
parasexual reproduction (see Timberlake et al., US 5,605,820). Further, as discussed 
elsewhere, the natural vegetative compatibility group of fungi can be expanded as a result of 
shuffling. ■ . 

Several isolates of Aspergillus nidulans, A: flavus, A. fumigatus, PenicilKum 
chrysogenum, P. notatum, Cephalosporium chrysogenum, Neurospora crassa, Aureobasidium 
pullulans have been karyotyped. Genome sizes generally range between 20 and 50 Mb among 
theAspergilli. Differences in karyotypes often exist between similar strains and are also 
caused by transformation with exogenous DNA. Filamentous fungal genes contain introns, 
usually -50-1 00 bp in size, with similar consensus 5' and 3 ' splice sequences. Promotion and 
termination signals are often cross-recognizable, enabling the expression of a gene/pathway 
from one fungus (e.g. A. nidulans) in another (e.g. P. chrysogenum): 

The major components of the fungal cell wall are chitin (or chitosan), (J-glucan, 
and mannoproteins. Chitin and 0-glucan form the scaffolding, mannoproteins are interstitial 
15 components which dictate the wall's porosity, antigenicity and adhesion. Chitin synthetase 
<»^y^ the polymerization of 

forming linear strands running antiparallel; P-(l,3)-glucan synthetase catalyze the 
homopolymerization of glucose. • 

One general goal of shuffling is to evolve fungi to become useful hosts for 

neurospora are generally the fungal organisms of choice to serve as a hosts for such 
manipulations because of their sexual cycles and well-established use in classical and molecular 
genetics. Another general goal is to improve the capacity of fungi to make specific 
compounds (e. g ; antibacterials (pemcillins, cephalosporins), antifungals (e.gi echinocandins, 
25 aureobasidins), and wood-degrading enzymes). There is some overlap between these general 
goals, and thus, some desired properties are useful for achieving both goals. 

One desired property is the introduction of meiotic apparatus into fungi 
presently backing a sexual cycle (see Sharon et at, Mol. Gen. Genet. 251, 60-6Z (1996)). A 
scheme for introducing a sexual cycle into the fungi P. chrysogenum (a fungus imperfecti) is 
shown in Fig. 6. Subpopulations of protoplasts are formed from A. nidulans (which has a 
sexual cycle) and P. chrysogenum, which does not. The two strains preferably bear different 
markers.. The A nidulans protoplasts are killed by treatment with UV or hydroxylamine. The 
two subpopulations are fused to form heterokaryons. In some heterokaryons, nuclei fuse, and 
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some recombination occurs. Fused cells are cultured under conditions to generate new cell 
walls and then to allow sexual recombination to occur. Cells with recombinant genomes are 
then selected (e.g., by selecting for complementation of auxotrophic markers present on the 
respective parent strains). Cells with hybrid genomes are more likely to have, acquired the 
5 genes necessary for a sexual cycle. Protoplasts of cells can then be crossed with killed 

protoplasts of a further population of cells known to have a sexual cycle (the same or different 
as the previous round) in the same manner, followed by selection for cells with hybrid 
genomes. 

Another desired property is the production of a mutator strain, of fungi. Such a 

10 fungus can be produced by shuffling a fungal strain containing a marker gene with one or more 
mutations that impair or prevent expression of a functional product. Shufflants are propagated 
under conditions that select for expression of the positive marker (while allowing a small 
amount of residual growth without expression). Shufflants growing fastest are selected to 
form the starting materials for the next round of shuffling. 

15 Another desired property is to expand the host range of a fungus so it can form 

heterokaryons with fungi from other vegetative compatibility groups. Incompatability^ 
between species results from the interactions of specific alleles at different incompatabiUty loci 
(such as the "het" loci). If two strains undergo hyphal anastomosis, a lethal cytoplasmic 
incompatabiUty reaction may occur if the strains differ at these loci. Strains must carry 

20 identical loci to be entirely compatible. Several of these loci have been identified in various 
species, and the incompatibility effect is somewhat additive (hence, "partial incompatibility" 
can occur). Some tolerant and /?e/-negative mutants have been described for these organisms 
(e.g. Dales & Croft, J. Gen. Microbiol. 136, 1717-1724 (1990)). Further, a tolerance gene 
(tol) has been reported, which suppresses mating-type heterokaryon incompatibility. Shuffling 

25 is performed between protoplasts of strains from different incompatibility groups. A preferred 
format uses a live acceptor strain and a UV-irradiated dead acceptor strain. The UV 
irradiation serves to introduce mutations into DN A inactivating het genes. The two strains 
should bear different genetic markers. Protoplasts of the strain are fused, cells are regenerated 
and screened for complementation of markers. Subsequent rounds of shuffling and selection 

30 can be performed in the same manner by fusing the cells surviving screening with protoplasts 
of a fresh population of donor cells. Similar to other procedures noted herein, the cells 
resulting from regeneration of the protoplasts are optionally refused by protoplasting and 
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regenerated into cells one or more times prior to W selection step to incr^e the diversity of 
the resulting population of cells to be screened. ' ' 

Another desired property is the introduction of multiple-allelomorph 
heterothaHism^^ 

5 property. This mating system allows outbreeding without self-breeding. Such a mating 
system can be introduce 

Gasteromycetes or Hymenomycetes, which have such a system. 

Another desired property is spontaneous formation of protoplasts to facilitate 
use of a fungal strain as a shuffling host ^ Here; the fungus to be evolvecl is typically 
10 mutagenized. Spores of the fungus to be evolved are ^ brie% treated with a ceU-wail degrading 
agent for a time insufficient for complete protoplast formation, and are mixed with p^ 
from other strain(s) of fungi. Protoplasts formed by fusion of the two different subpopulations 
are identified by genetic or other selection/or screening as described above. These protoplasts 
are used to regenerate mycelia and then spores, which form the starting material for the next 
15 round of shuffling. In the next round, at least some of the surviving spores are treated with 
cell-wall removing enzyme but for a shorter time than the previous round. After treatment 
the partially stripped cells are labeled with a first label. These cells are then mixed with 
protoplasts, which may derive from other cells surviving selection in a previous round, or from 
a fresh strain of fungi. These protoplasts are physically labeled with a second label. After 

These fusants are used to generate mycelia and spores for the next round of shuffling, and so 
forth. Eventually, progeny that spontaneously form protoplasts (i.e., without addition of cell 
wall degrading agent) are identified. As with other procedures rioted herein, cells or 
protoplasts can be reiteratively fused and regnerated prior to performing any selection step to 
increase the diversity of the resulting cells or protoplasts to be screened. Similarly, selected 
cells or protoplasts can be reiteratively fused and regenerated for one or several cycles without 
imposing selection on the resulting cellular or protoplast populations, thereby increasing the 
diversity of cells or protoplasts which are eventually screened. This process of performing 
multiple cycles of recombination interspersed with selection steps can be reiteratively repeated 
30 as desired. ^ - . 

Another desired property is the acquisition and/or improvement of genes 
encoding enzymes in biosynthetic pathways, genes encoding transporter proteins, and genes 
encoding proteins involved in metabolic flux control. In this situation; genes of the pathway 
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can be introduced into the fungus to be evolved either by genetic exchange with another strain 
of fungus possessing the pathway or by introduction of a fragment library from an organism 
possessing the pathway. Genetic material of these fungi can then be subjected to further 
shuffling and screening/selection by the various procedures discussed in this application. 
5 Shufflant strains of fungi are selected/screened for production of the compound produced by 
the metabolic pathway or precursors thereof. 

Another desired property is increasing the stability of fungi to extreme 
conditions such as heat. In this situation, genes conferring stability can be acquired by 
exchanging DNA with or transforming DNA from a strain that already has such properties. 

10 Alternatively, the strain to be evolved can be subjected to random mutagenesis. Genetic 

material of the fungus to be evolved can be shuffled by any of the procedures described in this 
application, with shufflants being selected by surviving exposure to extreme conditions. 

Another desired property is capacity of a fungus to grow under altered 
nutritional requirements (e.g., growth on particular carbon or nitrogen sources). Altering 

15 nutritional requirements is particularly valuable, e.g., for natural isolates of fungi that produce 
valuable commercial products but have esoteric and therefore expensive nutritional 
requirement. The strain to be evolved undergoes genetic exchange and/or transformation with 
DNA from a strain that has the desired nutritional requirements. The fungus to be evolved can 
then optionally be subjected to further shuffling as described in this application and with 

20 recombinant strains being selected for capacity to grow in the desired nutritional . ^ 

circumstances. Optionally, the nutritional circumstances can be varied in successive rounds of 
shuffling starting at close to the natural requirements of the fungus to be evolved and in 
subsequent rounds approaching the desired nutritional requirements. 

Another desired property is acquisition of natural competence in a fungus. The 

25 procedure for acquisition of natural competence by shuffling is generally described in 
PCT/US97/04494. The fungus to be evolved typically undergoes genetic exchange or 
transformation with DNA from a bacterial strain or fungal strain that already has this property. 
Cells with recombinant genomes are then selected by capacity to take up a plasmid bearing a 
selective marker. Further rounds of recombination and selection can be performed using any 

30 of the procedures described above. 

Another desired property is reduced or increased secretion of proteases and 
DNase. In this situation, the fungus to be evolved can acquire DNA by exchange or 
transformation from another strain known to have the desired property. Alternatively, the 
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fungus to be evolved can be subject to random mutagenesis. The fungus to be evolved is 
shuffled as above, the presence ofsuch enzymes, or lack thereof; can be assayed by 
contacting the culture media from ' individual isolates with a fluorescent molecule tethered to a 
support via a peptide or DNA linkage. Cleavage of the linkage releases detectable 
5' fluorescence to the media. 

Another desired property is producing fungi with "altered transporters (e g 
MDR). Such altered transporters are useful, for example, in fungi that have been evolved'to 
produce new secondary metabolites, to allow entry of precursors required for synthesis of the 
new secondary metabolites into a cell, or to allow efflux of the secondary metabolite from the 
10 cell. Transporters can be evolved by introduction of a library of transporter variants into 
fungal cells and allowing the cells to recombine by sexuaTor par^ To 
evolve a transporter with capacity to transport a precursor into the cells, cells are propagated 
m the present of precursor, and cells are then screened for production of metabolite To 
evolve a transporter with capacity to export a metabolite, cells are propagated under 
conditions supporting production of the metabolite, and screened for export of metabolite to 
culture medium. 

A general method of fungal shuffling is shown in Fig. 7. Spores from a frozen 
stock, a lyophUized stock, or fresh from an agar plate are used to inoculate suitable liquid 
™^um (1). Spores are genn^ted resulting in hyphal growth (2).. Mycelia are harvested, 
^and^wash^ by^ltration^ 

to enhance protoplast formation (3). Protoplasting is performed in an osmotically stabling 
medium (e.g., 1 m NaCl/20mivi MgS04, pH 5.8) by the addition of cell wall-degrading 
enzyme (e.g., Novozyme 234) (4). Cell wall degrading enzyme is removed by repeated 
washing with osmotically stabilizing solution (5). Protoplasts can be separated from mycelia, 
debris and spores by nitration through miracloth, and density centrifugation (6). Protoplasts 
are harvested by centrifugation and resuspended to the appropriate concentration. This step 
may lead to some protoplast fusion (7). Fusion can be stimulated by addition of PEG (e.g., 
PEG 3350), and/or repeated centrifugation and resuspension with or without PEG. 
Electrofusion can also be performed (8). Fused protoplasts can optionally be enriched from 
unfused protoplasts by sucrose gradient sedimentation (or other methods of screening 
described above). Fused protoplasts can optionally be treated with ultraviolet irradiation to 
stunulate recombination (9). Protoplasts are cultured on osmotically stabilized agar plates to 
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regenerate cell walls and form mycelia (10). The mycelia are used to generate spores (1 1), 
which are used as the starting material in the next round of shuffling (12). 

Selection for a desired property can be performed either on regenerated 
mycelia or spores derived therefrom. 
5 In an alternative method, protoplasts are formed by inhibition of one or more 

enzymes required for cell wall synthesis (see Fig. 8). The inhibitor should be fungistatic rather 
than fungicidal under the conditions of use. Examples of inhibitors include antifungal 
compounds described by (e.g., Georgopapadakou & Walsh, Antimicrqb. Ag. Chemother. 40, 
279-291 (1996); Lyman & Walsh, Drugs 44, 9-35 (1992)); Other examples include chitin 

10 synthase inhibitors (polyoxin or nikkomycin compounds) and/or glucan synthase inhibitors 
(e.g. echinocandins, papulocandins, pneumocandins). Inhibitors should be applied in 
osmotically stabilized medium. Cells stripped of their cell walls can be fused or otherwise 
employed as donors or hosts in genetic transformation/strain development programs. A 
possible scheme utilizing this method reiteratively is outlined in Figure 8 ? 

15 In a further variation, protoplasts are prepared using strains of fungi, which are 

genetically deficient or compromised in their ability to synthesize intact cell walls (see Fig. 9). 
Such mutants are generally referred to as fragile, osmotic-remedial, or cell wall-less, and are 
obtainable from strain depositories. Examples of such strains include Neurqspora crassa os 
mutants (Selitrennikoff, Antimicrob. Agents. Chemother. 23, 757-765 (1983)). Some such 

20 mutations are temperature-sensitive. Temperature-sensitive strains can be propagated at the 
permissive temperature for purposes of selection and amplification and at a.nonpermissive 
temperature for purposes of protoplast formation and fusion. A temperature sensitive strain 
Neurospora crassa os strain has been described which propagates as protoplasts when growth 
in osmotically stabilizing medium containing sorbose and polyoxin at nonpermissive 

25 temperature but generates whole cells on transfer to medium containing sorbitol at a 
permissive temperature. See US 4,873,196. 

Other suitable strains can be produced by targeted mutagenesis of genes 
involved in chitin synthesis, glucan synthesis and other cell wall-related processes. Examples 
of such genes include CHT1, CHT2 and CALI (or CSD2) of Saccharomyces cerevisiae and 

30 Candida spp. (Georgopapadakou & Walsh 1 996); ETGI/FKSI/CNDI/ CWH53/PB Rl and 
homologs in S. cerevisiae, Candida albicans, Cryptococcus neoformans, Aspergillus 
fumigatus, ChvAINdvA Agrobacterium and Rhizobium. Other examples are AW, orlB, or/C, 
MD, tsE, and bimG of Aspergillus nidulans (Borgia, J. BacterioL 174, 377-389 (1992)). 
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Strains of A.hidulans containing OrlAl or mutations lyse at ^ 
Lysis of these strains may be prevented by osmotic stabilisation, and the mutations may be 
complemented by the addition of N^acetylglucosimine (GlcNac). BimGll mutations are ts for 
a type 1 protein phosphatase (germlines of strains carrying this mutation lack chitin, and 
5 condia swell and lyse). Other suitable genes^^ 

Aspergillus fumigatus, chsl andchs2 oWeurospora crassa ; Phycomyces blakesleeattus MM 
and chsl, 2 and 3 of S. cerevisiae. Chsl is a non-essential repair enzyme; chs2 is involved in 
septum formafon and chs3 is involved in cell wall maturation and bud ring formation. 

Omer'usemistiiu^indude.S: cerevisiae CLY (cell lysis)' mutants such as -ts 
10 strains (Paravicini et al., Mol. Cell Biol. 12, 4896^905 (1992)), and the CLY 15 strain which 
harbors a PKC1 gene deletion. Other useful strains include strain VY 1160 containing a ts 
mutation in srb (encoding actin) (Schade et al ActaHistochent Suppl. 41, 193-200 (1991)) 
and a strain with ah ses mutation which results in increased sensitivity to cell-wall digesting ' 
enzymes isolated from snail gut (Metha & Gregory,^/. Environ. Microbiol. 41, 992-999 
15 (1981)). Useful strains of C. albicans include those with mutations in c^l, ^2 or c/«3 

(encoding chitin synthetases), such as osmotic remedial conditional lethal mutants described by 
Payton&de Tiani,C tt rr. G C ^. 17, 293-296 (1990); Cutilis mutants with increased 
sensitivity to cell-wall digesting enzymes isolated from snail gut (Metha & Gregory, 1981 

.; ^ ™ p ^ ..... 

W^Antimicr^ 

cell wall at 37°C, but at 22°C produce a cell wall. 

Targeted mutagenesis can be achieved by transforming cells with a positive- 
negative selection vector containing homologous regions flanking a segment to be targeted, a 
positive selection marker between the homologous regions and a negative selection marker' 
outside the homologous regions (see Capecchi, US 5,627,059): In a variation, the negative 
.selection marker can be an antisense transcript of the positive selection mark* (see US 
5,527,674). 

Other suitable cells can be selected by random mutagenesis or shuffling 
procedures in combination with selection. For example, a first.subpopulation of cells are 
mutagenized, allowed to recover from mutagenesis, subjected to incomplete degradation of 
cell walls and then contacted with protoplasts of a second subpopulation of cells. Hybrids 
cells bearing markers from both subpopulations are identified (as described above) and used as 
the starting materials in a subsequent round of shuffling. This selection scheme selects both 
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for cells with capacity for spontaneous protoplast formation and for cells with enhanced 
recombinogenicity. 

In a further variation, cells having capacity for spontaneous protoplast 
formation can be crossed with cells having enhanced recombinogenicity evolved using other 
5 methods of the invention. The hybrid cells are particularly suitable hosts for whole genome 
shuffling. 

Cells with mutations in enzymes involved in cell wall synthesis or maintenance 
can undergo fusion simply as a result of propagating the cells in osmotic-protected culture due 
to spontaneous protoplast formation. If the mutation is conditional, cells are shifted to a 
10 nonpermissive condition. Protoplast formation and fusion can be accelerated by addition of 
promoting agents, such as PEG or an electric field (See Philipova & Venkov, Yeast 6, 205-212 
(1990); Tsoneva et al., FEMS Microbiol Utt. 5\ 7 61-65 (19*9)). 

5 Targeted Shuffling — Hot Spots 
In one aspect, targeted homologous genes are cloned into specific regions of 

15 the genome (e.g., by homologous recombination or other targeting procedures) which are 

known to be recombination "hot spots" (i.e., regions showing elevated levels of recombination 
compared to the average level of recombination observed across an entire genome), or known 
to be proximal to such hot spots. The resulting recombinant strains are mated recursively. 
During meiotic recombination, homologous recombinant genes recombine, thereby increasing 

20 the diversity of the genes. After several cycles of recombination by recursive mating, the 
resulting cells are screened. 

6. Shuffling Methods in Yeast 
Yeasts are subspecies of fungi that grow as single cells. Yeasts are used for the 

production of fermented beverages and leavening, for production of ethanol as a fuel, low 

25 molecular weight compounds, and for the heterologous production of proteins and enzymes 

(see accompanying list of yeast strains and their uses). Commonly used strains of yeast 

include Saccharomyces cerevisiae, Pichia sp., Canidia sp. and Schizosaccharomyces pombe. 

Several types of vectors are available for cloning in yeast including integrative 

plasmid (Yip), yeast replicating plasmid (YRp, such as the 2\i circle based vectors), yeast 

30 episomal plasmid (YEp), yeast centromeric plasmid (YCp), or yeast artificial chromosome 

(YAC). Each vector can cany markers useful to select for the presence of the plasmid such as 
LUE2, URA3, and HI S3, or the absence of the plasmid such as URA3 (a gene that is toxic to 
cells grown in the presence of 5-fluoro orotic acid. 
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Many yeasts have a sexual cycle and asexual (vegetative) cycles. The sexual 
cycle involves the recombination of the whole genome of the organism each time the cell 
passes through meiosis. For example, when diploid cells of £ cerevisiae are exposed to : 
nitrogen and carbon limiting conditions, diploid cells undergo meiosis to foim asci. 
5 ascus holds four haploid spores, two of mating type "a" arid two of mating type H a Upon 
return to rich medium, haploid spores of opposite mating type mate to form diploid cells once 
again. Asiospores of opposite mating type can mate within the ascus, or if the ascus is 
degraded, for example with zymolase, the haploid cells are liberated and can mate with spores 
from other asci. This sexual cycle provides a format to shuffle endogenous genomes of yeast 
10 and/or exogenous fragment libraries inserted into yeast vectors. ; This process results in 
swapping or accumulation of hybrid genes, and for the shuffling of homologous sequences 
shared by mating cells. 

Yeast strains having mutations in several known genes have properties useful 
for shuffling. These properties include increasing the frequency of recombination and 
1 5 increasing the frequency of spontaneous mutations within a cell. These properties can be the 
result of mutation of a coding sequence or altered expression (usually overexpression) of a 
wildtype coding sequence. The HO nuclease effects the transposition of HMLa/a and 
HMKa/a to the MAT locus resulting in mating type switching. Mutants in the gene encoding 
this enzyme do not switch their mating type and can be employed to force crossing between 

to prevent in breeding of starter strains. PMS1, MLH1, MSH2, MSH6 are involved in 
mismatch repair. Mutations in these genes all have a mutator phenotype (Chambers et al., 
Mol. Cell BioL 16, 61 10-6120 (1996)). Mutations in TOP3 DNA topoisomerase have a 
6-fold enhancement of interchromosomal homologous recombination (Bailis et si., Molecular 

25 and Cellular Biology 12, 4988-4993 (1992)). The RAD50-57 genes confer resistance to 
radiation. Rad3 functions in excision of pyrimidine dimers. RAD52 functions in gene 
conversion. RADSO, MRE1 1, XRS2 function in both homologous recombination and 
illegitimate recombination. HOP 1, RED 1 function in early meiotic recombination 
(Mao-Draayer, Genetics 144, 71-86) Mutations in either HOP 1 or RED1 reduce double 

30 stranded breaks at the HIS2 recombination hotspot. Strains deficient in these genes are useful 
for maintaining stability in hyper recombinogenic constructs such as tandem expression 
libraries carried on YACs: Mutations in HPR 1 are hyperrecombinogenic. HDF1 has DNA 
end binding activity and is involved in double stranded break repair and V(D)J recombination. 
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Strains bearing this mutation are useful for transformation with random genomic fragments by 
either protoplast fusion or electroporation. Kar-1 is a dominant mutation that prevents 
karyogamy. Kar-1 mutants are useful for the directed transfer of single chromosomes from a 
donor to a recipient strain. This technique has been widely used in the transfer of YACs 
5 between strains, and is also useful in the transfer of evolved genes/chromosomes to other 
organisms (Markie, YAC Protocols, (Humana Press, Totowa, NJ, 1996). HOT1 is an S. 
cerevisiae recombination hotspot within the promoter and enhancer region of the rDNA 
repeat sequences. This locus induces mitotic recombination at adjacent sequences- 
presumably due to its high level transcription. Genes and/or pathways inserted under the 

10 transcriptional control of this region undergo increased mitotic recombination. The regions 
surrounding the arg 4 and his 4 genes are also recombination hot spots, and genes cloned in 
these regions have an increased probability of undergoing recombination during meiosis. 
Homologous genes can be cloned in these regions and shuffled in vivo by recursively mating 
the recombinant strains. CDC2 encodes polymerase 8 and is necessary for mitotic gene 

15 conversion. Overexpression of this gene can be used in a shuffler or mutator strain. A 
temperature sensitive mutation in CDC4 halts the cell cycle at Gl at the restrictive . 
temperature and could be used to synchronize protoplasts for optimized fusion and subsequent 
recombination. 

As with filamentous fungi, the general goals of shuffling yeast include, , 
20 improvement in yeast as a host organism for genetic manipulation, and as a production 
apparatus for various compounds. One desired property in either case is to improve the 
capacity of yeast to express and secrete a heterologous protein. The following example 
describes the use of shuffling to evolve yeast to express and secrete increased amounts of 
RNase A. 

25 RNase A catalyzes the cleavage of the P-Oy bond of RN A specifically after 

pyrimidine nucleotides. The enzyme is a basic 124 amino acid polypeptide that has 8 half 
cystine residues, each required for catalysis. YEpWL-RNase A is a vector that effects the 
expression and secretion of RNaseA from the yeast S. cerevisiae, and yeast harboring this 
vector secrete 1-2 mg of recombinant RNase A per liter of culture medium (del Cardayre et 

30 al., Protein Engineering 8(3):26, 1-273 (1 995)). This overall yield is poor for a protein 

heterologously expressed in yeast and can be improved at least 10-100 fold by shuffling. The 
expression of RNaseA is easily detected by several plate and micrptitre plate assays (del 
Cardayre & Raines, Biochemistry 33, 6031-6037 1994)). Each of the described formats for 
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whole genome shuffling can be used to shuffle a strain of S. cerevisiae bmboiing 
YEpWL.RNase A, and the resulting cells can be screened for the increased secretion of RNase 
A into the medium. The new strains are cycled recursively through the shuffling format, until 
sufficiently high levels of RNase A secretion is observed The use of RNase A is particularly 
5 useful since* not only requires proper folding and disulfide bond formation but also proper 
glycosylate. Thus numerous components of the expression, folding, and secretion systems 
can be optimized The resulting strain is also evolved for improved secretion of other 
heterologous proteins. . 

Another goal of shuffling yeast is to increase the tolerance of yeast to ethanol 
Such is useful both for the commercial production of ethanoi, and for the production of more 
alcoholic beers and wines. The yeast strain to be shuffled acquires genetic material by 
exchange or transformation with other strain( S ) of yeas t, which may or may not be know to 
have superior resistance to ethanol. The strain to be evolved is shuffled and shufflants are 
selected for capacity to survive exposure to ethanol. increasing concentrations of ethanol can 
be used in successive rounds of shuffling. The same principles can be used to shuffle baking 
yeasts for improved osmotolerance. 

Another desired property of shuffling yeast is capacity to grow under desired 
nutritional conditions. For example, it is useful to yeast to grow on cheap carbon sources such 
^metlumoVstafc^^olases, ceMulose.-cdlobiose, or xylose depending on availability The 
raOMpmiciptesMu^ 

Another desired property is capacity to produce secondary metabolites 
naturally produced by filamentous fungi or bacteria. Examples of such secondary metabolites 
are cyclosporin A, taxol, and cephalosporins: The yeast to be evolved undergoes genetic 
exchange or is transformed with DNA from organising) that produce the secondary 
25 metabolite. For example, fungi producing taxol include Taxomyces andreanae and 
Pestalotopis microspore (Stierle et al., S«e«c e 260, 214-216 (1993); Sfr^^ 
142, 435-440 (1996)). DNA can also be obtained from trees that naturally produce taxol 
such as Taxus brevifolia. DNA encoding one enzyme in the taxol pathway; taxadiene 
synthase, which it is believed catalyzes the committed step in taxol biosynthesis and may be 
30 rate limiting in overall taxol production, has been cloned (Wildung & Croteau, J. Biol Chem 
271, 9201-4 (1996). The DNA is then shuffled, and shufflants are screened/selected for 
production of the secondary metabolite: For example, taxol production can be monitored 
usmg antibodies to taxol, by mass spectroscopy or UV spectrophotometry. Alternatively 
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production of intermediates in taxol synthesis or enzymes in the taxol synthetic pathway can be 
monitored. Concetti & Ripani, Biol. Chem. Hoppe Seyler 375, 419-23 (1994). Other 
examples of secondary metabolites are polyols, amino acids, polyketides, non-ribosomal 
polypeptides, ergosterol, carotenoids, terpinoids, sterols, vitamin E, and the like. 
5 Another desired property is to increase the flocculence of yeast to facilitate 

separation in preparation of ethanol. Yeast can be shuffled : by any of the procedures noted 
above with selection for shuffled yeast forming the largest clumps: 

7. Exemplary procedure for veast protoplasting 
Protoplast preparation in yeast is reviewed by Morgan, in Protoplasts 

1 0 , (Birkhauser Verlag, Basel, 1 983). Fresh cells (-1 0 8 ) are washed with buffer, for example 0. 1 
M potassium phosphate, then resuspended in this same buffer containing a reducing agent, 
such as 50 mM DTT, incubated for 1 h at 30°C with gentle agitation, and then washed again 
with buffer to remove the reducing agent. These cells are then resuspended in buffer 
containing a cell wall degrading enzyme, such as Novozyme 234 (1 mg/mL), and any of a 

15 variety of osmotic stabilizers, such as sucrose, sorbitol, NaCl, KCI, MgS04, MgCh, or NH4CI 
at any of a variety of concentrations. These suspensions are then incubated at 30°C with gentle 
shaking (-60 rpm) until protoplasts are released. To generate protoplasts that are more likely 
to produce productive fusants several strategies are possible.. 

Protoplast formation can be increased if the cell cycle of the protoplasts have 
20 been synchronized to be halted at Gl. In the case of S. cerevisiae this can be accomplished by 
the addition of mating factors, either a or a (Curran & Carter, j m Gen. Microbiol. 129, 
1589-1591 (1983)). These peptides act as adenylate cyclase inhibitors which by decreasing 
the cellular level of cAMP arrest the cell cycle at Gl . In addition, sex factors have been 
shown to induce the weakening of the cell wall in preparation for the sexual fusion of a and a 

25 cells (Crandall & Brock, BacterioL Rev. 32, 139-163 (1968); Osumi et al., Arch. Microbiol. 
97, 27-38 (1974)). Thus in the preparation of protoplasts, cells can be treated with mating 
factors or other known inhibitors of adenylate cyclase, such as leflunomide or the killer toxin 
from AT. lactis y to arrest them at Gl (Sugisaki et al., Nature 304, 464-466 (1983)). Then after 
fusing of the protoplasts (step 2), cAMP can be added to the regeneration medium to induce 

30 S-phase and DNA synthesis. Alternatively, yeast strains having a temperature sensitive 

mutation in the CDC4 gene can be used- such that cells could be synchronized and arrested at 
Gl. After fusion cells are returned to the permissive temperature so that DNA synthesis and 
growth resumes. 
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Once suitable protoplasts have been prepared, it is necessary to induce fusion 
by physical or chemical means. A* equal number of protoplasts of each c^u type is ™ xed in 
phosphate buffer (0.2 M, pH 5.8, 2 "x 10 8 cells/mL) containing an osmotic stabilizer for 
example 0.8 MNaCI, and PEG 6000 (33% w/v) and then incubated at 30°C for 5 min while 
5 fu S1 on occurs. Polyols, or other com^ 

are then washed and resuspended in the osmotically stabilized buffer lacking PEG and 

^ Sfe " edtOOSmoti ^^ 
or screened for a desired property. 

10 YMQf JL Sh , Uf ? inP Meth ° ds Usin P Arrifi ™l ^rnmosomes 

Yeast art.fic.al chromosomes (Yacs) are yeast vectorsmlolvhich very large 

DNA fragments (e.g., 50-2000 kb) can be cloned (see, e.g., Monaco & Larin, Trends. 

Biotech. 12(7), 280-286 (1994); Ramsay, A/o/. Biotechnol 1(2), 181-201 ,994- Huxley 

Gene,E„ g . 16,65-91 (1994); iakobovits,^ ^^wi^Vl^AO^ 

• - c ^°^ G ^^^v 

(1994)). These vectors have telomeres (Tel), a centromere (Cen), an autonomously 
replicating sequence (ARS), and can have genes for positive (e.g., TRP1) and negative (e g 
URA3) S6lection YACs ^ maintained, replicated, and segregate as other yeast 
chromosomes through both meiosis and mitosis thereby providing a me an S to e xp 0sec «oned 
DNA to true meiotic recombination. ^ 

vivo. The substrates for shuffling are typically large fragments from 20 kb to 2 Mb The 
fragments can be random fragments or can be fragments known to encode a desirable 
property, For example, a fragment might include an ^ operoh of genes involved in production of 
antibiotics. Libraries^ also include whole genomes or chromosomes. Viral genomes and 
some bacterial genomes can be cloned intact into a single YAG, In some libraries fragments 
are obtained from a single organism. Other Ubraries include fragment variants, as where some 
libraries are obtained from different individuals or species. Fragment variants can also be 
generated by induced mutation. Typically, genes witl.m fragments are expressed from 
naturally associated regulatory sequences within yeast However, alternatively, individual 
30 genes can be linked to yeast regulatory elements to form an expression cassette and a 

concatemer of such cassettes, each containing a different gene, can be inserted into a YAC. 

In some instances; fragments are incorporated mto the yeast genome and 
shuffl.ng.is used to evolve improved yeast strains. In other instances, fragments remain as 
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components of YACs throughout the shuffling process, and after acquisition of a desired 
property, the YACs are transferred to a desired recipient cell. 

9. Methods of Evolving Yeast Strains 
Fragments are cloned into a YAC vector, and the resulting YAC library is 

5 transformed into competent yeast cells. Transformants containing a YAC are identified by 

selecting for a positive selection marker present on the YAC. The cells are allowed to recover 

and are then pooled. Thereafter, the cells are induced to sporulate by transferring the cells 

from rich medium, to nitrogen and carbon limiting medium. In the course of sporulation, cells 

undergo meiosis. Spores are then induced to mate by return to rich media. Optionally, asci 

10 are lysed o liberate spores, so that the spores can mate with other spores originating from 
other asci. Mating results in recombination between YACs bearing different inserts, and 
between YACs and natural yeast chromosomes. The latter can be promoted by irradiating 
spores with ultra violet light. Recombination can give rise to new phenotypes either as a result 
of genes expressed by fragments on the YACs or as a result of recombination with host genes, 

15 or both. 

After induction of recombination between YACs and natural yeast 
chromosomes, YACs are often eliminated by selecting against a negative selection marker on ] 
the YACs. For example, YACs containing the marker URA3 can be selected against by 
propagation on media containing 5-fluro-orotic acid. Any exogenous or altered genetic 

20 material that remains is contained within natural yeast chromosomes. Optionally, further i 
rounds of recombination between natural yeast chromosomes can be performed after 
elimination of YACs. Optionally, the same or different library of YACs can be transformed 1 
into the cells, and the above steps repeated. By recursively repeating this process, the 
diversity of the population is increased prior to screening. 

25 After elimination of YACs, yeast are then screened or selected for a desired 

property. The property can be a new property conferred by transferred fragments, such as 
production of an antibiotic. The property can also be an improved property of the yeast such 
as improved capacity to express or secrete an exogenous protein, improved 
recombinogenicity, improved stability to temperature or solvents, or other property required 

30 of commercial or research strains of yeast. 

Yeast strains surviving selection/screening are then subject to a further round 
of recombination. Recombination can be exclusively between the chromosomes of yeast 
surviving selection/screening. Alternatively, a library of fragments can be introduced into the 
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yeast cells and recombined with endogenous yeast chromosomes as before This library of 
fragments can be the same or different from the library used in the previous round of 
transformation. For example, the YACs could contain a library of genomic DNA isolated 
from a pool of the improved strains obtained in the earlier steps. YACs are eliminated as 
> before, followed by additional rounds of recombination and/or transformation with further 
YAC hbranes. Recombination is followed by another round of selection/screening as above 
Further rounds of recombination^creening can be performed as needed until a yeast strain has 
evolved to acquire the desired property. 

An exemplary scheme for evolving yeast by introduction of a YAC library is 
shown in Fig. 10. ^e first part of the figure showsy east conta^ 
genome and a YAC library of fragments representing variants of a sequence. The library is 
transformed into the Cells to yield 100-1000 colonies per jig DNA Most transformed yeast 
cells now harbor a single YAC as weH as endogenous chromosome, ^ Meiosis is induced by 
growth on nitrogen and carbon limiting medium. In the course of meiosis the YACs 
recombine with other chromosomes in the same cell. Haploid spores resulting from meiosis 
mate and regenerated diploid forms. The diploid forms now harbor recombinant 
chromosomes, parts of which come from endogenous chromosomes and parts from YACs 
Opfconally, the YACs can now be cured from the cells by selecting against a negative selection 

20 screened or selected for a desired property. Cells surviving selection/screening are 
transformed with another YAC library to start another shuffling cycle. 

T . 10 _ M * thod of Evolving YACs for Transfer to Rec ipient Strain 
These methods are based in part on the fact that multiple YACs can be 

harbored in the same yeast cell, and YAC- YAC recombination is known to occur (Green & 
Olson, Science 250, 94-98 1990)). Inter-YAC recombination provides a format for which 
families of homologous genes harbored on fragments of >20 kb can be shuffled /« v/ W . 
The starting population of DNA fragments show sequence similaritywith each other but differ 
as a result of for example, induced, allelic or species diversity! Often DNA fragments are 
known or suspected to encode multiple genes th^ 

The fragments are cloned into a Yac and transfoim^ 
positive selection for transformants. The transformants are induced to sporulate, as a result of 
wh.cn chromosomes undergo meiosis. The cells are then mated. Most of the resulting diploid 
cells now carry two YACs each having a different insert. These are again induced to sporulate 
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and mated. The resulting cells harbor YACs of recombined sequence. The cells can then be 
screened or selected for a desired property. Typically, such selection occurs in the yeast strain 
used for shuffling. However, if fragments being shuffled are not expressed in yeast, YACs can 
be isolated and transferred to an appropriate cell type in which they are expressed for 
5 screening. Examples of such properties include the synthesis or degradation of a desired 
compound, increased secretion of a desired gene product, or other detectable phenotype. 

Preferably, the YAC library is transformed into haploid a and haploid a cells. 
These cells are then induced to mate with each other, i.e., they are pooled and induced to mate 
by growth on rich medium. The diploid cells, each carrying two YACs, are then transferred to 

10 sporulation medium. During sporulation, the cells undergo meiosis, and homologous 
chromosomes recombine. In this case, the genes harbored in the YACs will recombine, 
diversifying their sequences. The resulting haploid acospores are then liberated from the asci 
by enzymatic degradation of the asci wall or other available means and the pooled liberated 
haploid acospores are induced to mate by transfer to rich medium. This process is repeated 

15 for several cycles to increase the diversity of the DNA cloned into the YACs. The resulting 
population of yeast cells, preferably in the haploid state, are either screened for improved 
properties, or the diversified DNA is delivered to another host cell or organism for screening. 

Cells surviving selection/screening are subjected to successive cycles of 
pooling, sporulation, mating and selection/screening until the desired phenotype has been 

20 observed. Recombination can be achieved simply by transferring cells from rich medium to 
carbon and nitrogen limited medium to induce sporulation, and then returning the spores to 
rich media to induce mating. Asci can be lysed to stimulate mating of spores originating from 
different asci. 

After YACs have been evolved to encode a desired property they can be 
25 transferred to other cell types. Transfer can be by protoplast fusion, or retransformation with 
isolated DNA For example, transfer of YACs from yeast to mammalian cells is discussed by 
Monaco & Larin, Trends in Biotechnology 12, 280-286 (1994); Montoliu et at, Reprod 
Fertil Dev. 6, 577-84 (1994); Lamb et aL, Curr. Opin. Genet Dev. 5, 342-8 (1995). 

An exemplary scheme for shuffling a YAC fragment library in yeast is shown in 
30 Fig. 11. A library of YAC fragments representing genetic variants are transformed into yeast 
that have diploid endogenous chromosomes. The transformed yeast continue to have diploid 
endogenous chromosomes, plus a single YAC. The yeast are induced to undergo meiosis and 
sporulate. The spores contain haploid genomes and are selected for those which contain a 
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YAC, using the VAC selective marker, the spores are induced to mate generating diploid 
cells The diploid cells now contain two YACs bearing different inserts as well as diploid 
endogenous chromosomes. The cells are again induced to undergo meiosis and sporulate. 
during meiosis, recombination occurs between the YAC inserts, and recombinant YACs are 
segregated to ascoytes. Some ascoytes thus contain haploid endogenous chromosomes plus a 
YAC chromosome with a recombinant insert. The ascoytes mature to spores, which can mate 
again generating diploid ceils. Some diploid cells now possess a diploid complement of 
endogenous chromosomes plus two recombinant YACs. These cells can then be taken 
through further cycles of meiosis, sporulation and mating. In each cycled further 
recombination occurs between YAC inserts and further recombinant forms of . inserts are 
generated. After one or several cycles of recombination has occurred, cells can be tested for ' 
acquisition of a desired property . Further cycles of recombination, followed by selection, can 
then be performed in similar fashion. 

11.7/7 vivo Shuffling of Genes bv the R ecursive Mat ing ofY M!! t rviic 
Harboring Homologou s Genes in Identical Loci 
A 8 oaI ofDNA shuffling is to mimic and expand the combinatorial capabilities 
of sexual recombination!. In vitro DNA shuffling succeeds in this process. However, by 
- changing,^ 

recombination occurs, naturally invitro recombination methods may jeopardize intrinsic 
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Shuffling in vivo by employing the natural crossing over mechanisms that occur 
during meiosis may access inherent natural sequence information and provide a means of 
creating higher quality shuffled libraries. Described here is a method for the in vivo shuffling 
of DNA that utilizes the natural mechanisms of meiotic recombination and provides an 
alternative method for DNA shuffling. ; . . 

The basic strategy is. to clone genes to be shuffled into identical loci within the 
haploid genome of yeast. The haploid cells are then recursively induced to mate and to 
sporulate.. The process subjects the cloned genes to recursive recombination during recursive 
cycles of meiosis. The resulting shuffled genes are then screened in in situ or isolated and 
30. screened under different conditions.. ; , . , , 

For example, if one wished to shuffle a family of five lipase genes, the 
following provides a means of doing so in vivo. • ..<■<'- - : . 

The open reading frame of each lipase is amplified by the PCR such that each 
ORF is flanked by identical 3' and 5' sequences. The 5' flanking sequence is identical to a 
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region within the 5' coding sequence of the & cerevisiae ura 3 gene and the 3* flanking 
sequence is identical to a region within the 3' of the ura 3 gene. The flanking sequences are 
chosen such that homologous recombination of the PCR product with the ura 3 gene results in 
the incoiporation of the lipase gene and the disruption of the ura 3 ORF. Both S. cerevisiae a 
5 and a haploid cells are then transformed with each of the PCR amplified lipase ORFs, and 
cells having incorporated a lipase gene into the ura 3 locus are selected by growth on 5 fluoro 
orotic acid (5FOA is lethal to cells expressing functional URA3). The result is 10 cell types, 
two different mating types each harboring one of the five lipase genes in the disrupted ura 3 
locus. These cells are then pooled and grown under conditions where mating between the a 

10 and a cells are favored, e.g. in rich medium. 

Mating results in a combinatorial mixture of diploid cells having all 32 possible 
combinations of lipase genes in the two ura 3 loci. The cells are then induced to sporulate by 
growth under carbon and nitrogen limited conditions. During sporulation the diploid cells 
undergo meiosis to form four (two a and two a) haploid ascospores housed in an ascus. 

15 During meiosis II of the sporulation process sister chromatids align and crossover. The lipase 
genes cloned into the ura3 loci will also align and recombine: Thus the resulting haploid 
ascospores will represent a library of cells each harboring a different possible chimeric lipase 
gene, each a unique result of the meiotic recombination of the two lipase genes in the' original 
diploid cell. The walls of asci are degraded by treatment with zymolase to liberate and allow 

20 the mixing of the individual ascospores. This mixture is then grown under conditions that 
promote the mating of the a and a haploid cells. It is important to liberate the individual 
ascospores, since mating will otherwise occur between the ascospores within an ascus. ?* 
Mixing of the haploid cells allows recombination between more than two lipase genes, 
enabling "poolwise recombination." Mating brings together new combinations of chimeric 

25 genes that can then undergo recombination upon sporulation. The cells are recursively cycled 
through sporulation, ascospore mixing, and mating until sufficient diversity has been generated 
by the recursive pairwise recombination of the five lipase genes. The individual chimeric lipase 
genes either can be screened directly in the haploid yeast cells or transferred to an appropriate 
expression host. 

30 The process is described above for lipases and yeast; however, any sexual 

organisms into which genes can be directed can be employed, and any genes, of course, could 
be substituted for lipases. This process is analogous to the method of shuffling whole 
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genomes by recursive painwse mating. The ^ diversity, however, in the whole genome case is 
distributed throughout the host genome rather than localized to specific loci. 

12. Use of YACs to Clone Unlinked fan^c 
Shuffling of YACs is particularly amenable to transfer of uniinked but 
functionally related genes from one species to another, particularly where such genes have not 
been identified. Such is the case for several commercially important natural products, such as 
taxol. Transfer of the genes in the metabolic pathway to a different organism is often desirable 
because organisms naturally producing such compounds are not well suited for mass culturing. 

Clusters of such genes can be isolated by cloning a total genomic library of 
DNA from an organisms producing a useful compound into a YAC library. The YAC library 
is then transformed into yeast. The yeast is sporulated and mated such that recombination 
occurs between YACs and/or between YACs and natural yeast chromosomes. 
Selection/screening is then performed for expression of the desired collection of genes. If the 
genes encode a biosynthetic pathway, expression can be detected from the appearance of 
product of the pathway. Production of individual enzymes in the pathway, or intermediates of 
the final expression product or capacity of cells to metabolize such intermediates indicates, 
partial acquisition of the synthetic pathway. The original library or a different library can be 
introduced into cells surviving/selection s creening, and further rounds of recombination and 

produced. • 

13. YAC- YAC Shuffling 
If a phenotype of interest can be isolated to a single stretch of genomic DNA 
less than 2 megabases in length, it can be cloned into a YAC and replicated in S. cerevisiae. 
The cloning of similar stretches of DNA from related hosts into an identical YAC results in a 
population of yeast cells each harboring a YAC having a homologous insert effecting a desired 
phenotype. The recursive breeding of these yeast cells allows the homologous regions of 
these YACs to recombine during meiosis; allowing genes, pathways, and clusters to recombine 
during each cycle of meiosis. After several cycles of mating and segregation, the YAC inserts 
are well shuffled. The now very diverse yeast library could then be screened for phenotypic 
improvements resulting from the shuffling of the YAC inserts. ' 
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14. YAC-Chromoso me Shuffling 
''Mitotic" recombination occurs during cell division and results from the 

recombination of genes during replication. This type of recombination is not limited to that 

between sister chromatids and can be enhanced by agents that induce recombination 

5 machinery, such as nicking chemicals and ultraviolet irradiation. Since it is often difficult to 

directly mate across a species barrier, it is possible to induce the recombination of homologous 

genes originating from different species by providing the target genes to a desired host 

organism as a YAC library. The genes harbored in this library are then induced to recombine 

with homologous genes on the host chromosome by enhanced mitotic recombination. This 

10 process is carried out recursively to generate; a library of diverse organisms and then screened 

for those having the desired phenotypic improvements. The improved subpopulation is then 

mated recursively as above to identify new strains having accumulated multiple useful genetic 

alterations. 

15. Accumulation of Multiple YACs Harboring Useful Genes 
1 5 The accumulation of multiple unlinked genes that are required for the 

acquisition or improvement of a given phenotype can be accomplished by the shuffling of 

YAC libraries. Genomic DNA from organisms having desired phenotypes, such as ethanol 

tolerance, thermotolerance, and the ability to ferment pentose sugars are pooled, fragmented 

and cloned into several different YAC vectors, each having a different selective marker (his, 

20 ura, ade, etc). S. cerevisiae are transformed with these libraries, and selected for their 

presence (using selective media i.e uracil dropout media for the YAC containing the Ura3 
selective marker) and then screened for having acquired or improved a desired phenotype. 
Surviving cells are pooled, mated recursively, and selected for the accumulation of multiple 
YACs (by propagation in medium with multiple nutritional dropouts). Cells that acquire 

25 multiple YACs harboring useful genomic inserts are identified by further screening. Optimized 
strains can be used directly, however, due to the burden a YAC may pose to a cell, the 
relevant YAC inserts can be minimized, subcloned, and recombined into the host chromosome, 
to generate a more stable production strain. 

16. Choice of Host SSF Organism 

30 One example use for the present invention is to create an improved yeast for 

the production of ethanol from lignocellulosic biomass. Specifically, a yeast strain with 
improved ethanol tolerance and thermostability/thermotolerance is desirable. Parent yeast 
strains known for good behavior in a Simultaneous Saccharification and Fermentation (SSF) 
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process are identified. These strains are combined with others known to possess ethanol 
tolerance and/or thermostability. 

S. cerevisiae is highly amenable to development for opting 
It mherently possesses several traits for this use, including the ability to import and ferment a 
variety of sugars such as sucrose, glucose, galactose, maltose and maltriose Also yeast has 
the capability to flocculate, enabling recovery of the yeast biomass at the end of a fermentation 
cycle, and allowing its re-use in subsequent bioprocesses. This is an important property in that 
«t optunizes the use of nutrients in the growth medium: S. cerevisiae is also highly amenable 
to laboratory manipulation, has highly characterized genetics and possesses a sexual 
reproductive cycle. S. cerevisiae may be grown under either aerobic or anaerobic conditions 
in contrast to some other potential SSF organisms that are strict anaerobes (e.g CW^ ' 
spp.), making them very difficult to handle in the laboratory. S. cerevisiae arc also "generally 

comestibles for the general public (e.g. beer, wine, bread, etc), is generally familiar and well 
15 ^ow. S. cerevisiae is commonly used in fermentative processes, and the familiarity in its 
( handhng by fermentation experts eases the Production of novel improved yeast strains into 
the industrial setting. 

. V- \J' . cer V*™«^:^ previously have been'identified as R articularly good 

20 (1994) Appl. B.ochem. B.otechnol. 45/46, 467-481; Ranatunga TD et al. (1997) Biotechnol. 
Lett. 19: 1125-1 127) can be used for starting materials. In addition, other industrially used 5. 
cerevisiae strains are optionally used as host strains, particularly those showing desirable 
fermentative characteristics, such as S. cerevisiae Y567 (ATCC24858) (Sitton OC et al 
(1979) Process Biochem. 14(9): 7-10; Sitton OC etal (1981) Adv. Biotechnol. 2: 231-237 
McMurrough I etal. (1971) Folia Microbiol. 16: 346-349) and S. cerevisiae AC A 174(ATCC 
60868) (BenitezT et al. (1983) Appl. Enviroa Microbiol. 45: 1429-1436; Chem. Eng. J. 50: 
Bl 7-B22, 1992), which have been shown to have desirable traits for large^ scale fermentation. 
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' 1 7. Choice of EthannI Tolerant Strainc 

Many strains of S. cerevisiae have been isolated from high-ethanol 
environments, and have survived in the ethanol-rich environment by adaptive evolution. For 
example, strains from Sherry wine aging ("Flor" strains) have evolved highly functional 
mitochondria to enable their survival in a high-ethanol environment. It has been shown that 
transfer of these wine yeast mitochondria to other strains increases the recipient's resistance to 
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high ethanol concentration, as well as thermotolerance (Jimenez, J. and Benitez, T (1988) 
Curr. Genet. 13: 461-469). There are several flor strains deposited in the ATCC, for example 
S. cerevisiae MY91 (ATCC 201301), MY138 (ATCC 201302), C5 (ATCC 201298), ET7 
(ATCC 201299), LA6 (ATCC 201300), OSB21 (ATCC 201303), F23 (£ globosus ATCC 
5 90920). Also, several flor strains of S. uvarum and Torulaspora pretoriensis have been 

deposited. Other ethanol-tolerant wine strains include S. cerevisiae AC A 174 (ATCC 60868), 
15% ethanol, and S. cerevisiae A54 (ATCC 90921), isolated from wine containing 18% (v/v) 
ethanol, and NRCC 202036 (ATCC 46534), also a wine yeast. Other S. cerevisiae 
ethanologens that additionally exhibit enhanced ethanol tolerance include ATCC 24858, 
10 ATCC 24858, G 3706 (ATCC 42594), NRRL Y-265 (ATCC 60593), and ATCC 24845 - 
ATCC 24860. A strain of £ pastorianus (& carlsbergensis ATCC 2345) has high ethanoi- 
tolerance (13% v/v). S. cerevisiae Sa28 (ATCC 26603), from Jamaican cane juice sample, 
produces high levels of alcohol from molasses, is sugar tolerant, and produces ethanol from 
wood acid hydrolyzate. 

15 Several of the listed strains, as well as additional strains can be used as starting 

materials for breeding ethanol tolerance. 

18. Choice of Temperature Tolerant Strains 
A few temperature tolerant strains have been reported, including the highly 

flocculent strain S. pastorianus S A 23 (S. carlsbergensis ATCC 26602), which produces 
20 ethanol at elevated temperatures, and A cerevisiae Kyokai 7 (S. sake, ATCC 26422), a sake 
yeast tolerant to brief heat and oxidative stress. Ballesteros et al ((1991) Appl. Biochem. 
Biotechnol. 28/29: 307-3 15) examined 27 strains of yeast for their ability to ©row and ferment 
glucose in the 32-45°C temperature range, including Saccharomyces, Kluyveromyces and 
Candida spp. Of these, the best thermotolerant clones were Kluyveromyces marxianus LG 
25 and Kluyveromyces fragilis 2671 (Ballesteros et al (1993) AppL Biochem. Biotechnol. 39/40: 
201-21 1). S. cerevisiae-pretoriensis FDHI was somewhat thermotolerant, however was poor 
in ethanol tolerance. Recursive recombination of this strain with others that display ethanol 
tolerance can be used to acquire the thermotolerant characteristics of the strain in progeny 
which also display ethanol tolerance. 
30 Candida acidothermophilum (Issatchenkia orientalis, ATCC 203 8 1 ) is a good 

SSF strain that also exhibits improved performance in ethanol production from lignocellulosic 
biomass at higher SSF temperatures than 5. cerevisiae D5A (Kadam, KL, Schmidt, SL (1997) 
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AppLMicrohior Rioteohnoi ; ^: 709:713)! This strain can also be a genetic contributor to an 

improved SSF strain. : . 

19. Shufflin g of Strainc 

In those instances where strains are highly related; a recursive mating strategy 
may be pursued. For example, a population of hapioid S. cerevisiae (a and alpha) are 
mutagenized and screened for improved EtOH or thermal tolerance. The improved hapioid 
^population are mixed together and mated as a pool and induced to spomlate The resulting 
hapioid spores are freed by degrading the asci wall and mixed; The freed spores are then 
induced to mate and sporulate recursively! This process is repeated a sufficient number of 
times to generate aU possible mutant combinations. The whole genome shuffled population 
(hapioid) is then screened for further EtOH or thermal tolerance. 

When strains are not sufficiently related for recursive matmg^formats based on 
protoplast fusion may be employed. Recursive and poolwise protoplast fusion can be 
performed to generate chimeric populations of diverse parental strains The resultant pool of 
progeny is selected and screened to identify improved ethanol and thermal tolerant strains. 

Alternatively, a YAC-based Whole Genome Shuffling format can be used In 
this format, YACs are used to shuttle large chromosomal fragments between strains. As 
detailed earlier, recombination occurs between YACs or between YACs, and the host 

fragmented and cloned into several different YAC vectors, e^ch having a different selective 
marker (his, ura, ade, etc). S. cerevisiae are transformed with these libraries, and selected for 
their presence (using selective media, i.e. uracil dropout media for the YAC containing the 
Ura3 selective marker) and then screened for having acquired or improved a desired 
phenotype. Surviving cells are pooled, mated recursively (as above), and selected for the 
accumulation of multiple YACs (by propagation in medium with multiple nutritional 
dropouts). Cells that acquire multiple YACs harboring useful genomic inserts are identified by 
further screening (see below). 

20. Selection for Imp roved Strain < 
Having produced large libraries of novel strains by mutagenesis and 
recombination, a first task is to isolate those strains that possess improvements in the desired 
phenotypes. Identification of the organism libraries is facilitated where the desired key traits 
are selectable phenotypes. For example, ethanol has different effects on the growth rate of a 
yeast population, viability, and fermentation rate. Inhibition of cell growth and viability 
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increases with ethanol concentration, but high fermentative capacity is only inhibited at higher 
ethanol concentrations. Hence, selection of growing cells in ethanol is a viable approach to 
isolate ethanol-tolerant strains. Subsequently, the selected strains may be analyzed for their 
fermentative capacity to produce ethanol. Provided that growth and media conditions are the 
5 same for all strains (parents and progeny), a hierarchy of ethanol tolerance may be 
constructed. 

Simple selection schemes for identification of thermal tolerant and ethanol 
tolerant strains are available and, in this case, are based on those previously designed to 
identify potentially useful SSF strains. Selection of ethanol tolerance is performed by exposing 

10 the population to ethanol, then plating the population and looking for growth. Colonies 

capable of growing after exposure to ethanol can be re-exposed to a higher concentration of 
ethanol and the cycle repeated until the most tolerant strains are selected. In order to discern 
strains possessing heritable ethanol tolerance from with temporarily acquired adaptations, 
these cycles may be punctuated with cycles of growth in the absence of selection (e.g. no 

15 ethanol). 

Alternatively, the mixed population can be grown directly at increasing 
concentrations of ethanol, and the most tolerant strains enriched (Aguilera and Benitez, 1986, 
Arch Microbiol 4:337-44). For example this enrichment could be carried out in a chemostat or 
turbidostat. Similar selections can be developed for thermal tolerance, in which strains are 

20 identified by their ability to grow after a heat treatment, or directly for growth at elevated 

temperatures (Ballesteros et al., 1991, Applied Biochem and Biotech, 28:307-315). The best 
strains identified by these selections will be assayed more thoroughly in subsequent screens for 
ethanol, thermal tolerance or other properties of interest. 

In one aspect, organisms having increased ethanol tolerance are selected for. A 

25 population of natural S. cerevisae isolates are mutagenized. This population is then grown 
under fermentor conditions under low initial ethanol concentrations. Once the culture has 
reached saturation, the culture is diluted into fresh medium having a slightly higher ethanol 
content. This process of successive dilution into medium of incrementally increasing ethanol 
concentration is continued until a threshold of ethanol tolerance is reached. The surviving 

30 mutant population having the highest ethanol tolerance are then pooled and their genomes 
recombined by any method noted herein. Enrichment could also be achieved by a continuos 
culture in a chemostat or turbidostat in which temperature or ethanol concentrations are 
progressively elevated. The resulting shuffled population are then exposed once again to the 
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enrichment strategy but :at a higher starting ^ medium ethanol concentration. This ^ 
opt.onally app^ for the enrichment of 
havmg combined thermo- and ethanol tolerance. 

21 . Screening for Improved Strains 
Strains showing viability in initial selections are assayed more quantitatively for 
improvements in the desired properties before being reshuffled with other strains. 

Progeny resulting from mutagenesis of a strain, or. those preselected for their ethanol 
tolerance and/or thermostability; can be plated on non-selective agar! Colonies can be picked 
robotically into microtiter dishes and grown. Cultures are replicated to fresh microtiter plates 
and the replicates are incubated under the appropriate stress conditions). The growth or 

can range from the size of growing colonies on solid media, density of growing cultures or 
color change of a metabolic activity indicator added to liquid media. Strains that show the 
greatest viability are then mixed and shuffled, and the resulting progeny are rescreened under 

15 more stringent conditions ' * 

' §^ Ck * menf of " n Pth^nol ogen Capable of Converting Cellule 

developed, the degradation of cellulose to monomeric sugars is provided by the inclusion to 
the host strain of an efficient cellulase degradation pathway. 

Additional desirable characteristic can be useful to enhance the production of 
ethanol by the host. For example, inclusion of heterologous enzymes and pathways that 
broaden the substrate sugar range may be performed . "Tuning" of the strain can be 
accomplished by the addition of various other traits,^ or the restoration of certain endogenous 
traits that are desirable, but lost during the recombination procedures. 

23. Conferring of Cp.flnl ase Activity 
A vast number of cellulases and cellulase degradation systems have been 
characterized from fungi, bacteria and yeast (see reviews by Beguin, P and Aubert J-P (1994) 
mdSMo^bloLR^U: 25-58; Ohima, K et al! (1997) Biotechnol ^ ^ l4 
365-414). An enzymatic pathway required for efficient saccharification of cellulose involves 
the synergistic action of endoglucanases ^ (endo-l,4-R-D-glucanases, EC 3 2 1 4) 
exocellobiohydrolases (exo-l,4ip-D-g,ucanases, EC 3.2.1.91), andp.glucosidases (cellobias.es, 
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1,4-p-D-glucanases EC 3.2.1.21) (Fig. 9). The heterologous production of cellulase enzymes 
in the ethanologen would enable the saccharification of cellulose, producing monomelic sugars 
that may be used by the organism for ethanol production. There are several advantages to the 
heterologous expression of a functional cellulase pathway in the ethanologen. For example, 
5 the SSF process would eliminate the need for a separate bioprocess step for saccharification, 
and would ameliorate end-product inhibition of cellulase enzymes by accumulated intermediate 
and product sugars. 

Naturally occurring cellulase pathways are inserted into the ethanologen, or 
one may choose to use custom improved "hybrid" cellulase pathways, employing the 

10 coordinate action of cellulases derived from different natural sources, including thermophiles. 

Several cellulases from non-Saccharomyces have been produced and secreted 
from this organism successfully, including bacterial, fungal, and yeast enzymes, for example T 
reesei CBH I ((Shoemaker (1994), in "The Cellulase System of Trichoderma reesei: 
Trichoderma strain improvement and Expression of Trichoderma celluloses in Yeast" Online, 

15 Pinner, UK, 593-600). It is possible to employ straightforward metabolic engineering . 

techniques to engender cellulase activity in Saccharomyces. Also, yeast have been forced to 
acquire elements of cellulose degradation pathways by protoplast fusion (e.g. intergeneric 
hybrids of Saccharomyces cerevisiae and Zygosaccharomyces fermentati, a cellobiase- 
producing yeast, have been created (Pina A, et. ah (1986) Appl. Environ. Microbiol. 51: 995- 

20 1003). In general, any cellulase component enzyme that derives from a closely related yeast 
organism could be transferred by protoplast fusion. Cellobiases produced by a somewhat 
broader range of yeast may be accessed by whole genome shuffling in one of its many formats 
(e.g. whole, fragmented, YAC-based). 

Optimally, the cellulase enzymes to be used should exhibit good synergy, an 

25 appropriate level of expression and secretion from the host, good specific activity (i.e. 

resistance to host degradation factors and enzyme modification) and stability in the desired 
SSF environment. An example of a hybrid cellulose degradation pathway having excellent 
synergy includes the following enzymes: CBH I exocellobiohydrolase of Trichoderma reesei, 
the Acidothermus cellulolyticus El endoglucanase, and the Thermomonospera fusca E3 

30 exocellulase (Baker, et. al. (1998) A ppl. Biochem. Biotechnol 70-72: 395-403). 

It is suggested here that these enzymes (or improved mutants thereof) be 
considered for use in the SSF organism, along with a cellobiase (p-glucosidase), such as that 
from Candida peltata. Other possible cellulase systems to be considered should possess , 
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particular^ good activity against crystalline cellulose, such as the T. reesei cellulase system 
(Te^TT;et.^ 

thermostability characteristics (e g cellulase ' systems from thermophilic organisms, such as " 
Thermomonospora fused (Zhang, S . et d/(1995) Bic^h^ 34: 3386-335). 

A rational approach to the cloning of cellulases in the ethanologenic yeast host 
could be used. For exainple. ^oWcellulase genes are clo^ 

utilizing.* cerevisiae promoter sequences; and the resultant linear fragments of DNA may be 
transformed into the recipient host by placing short yeast sequences at the termini to 
encourage site-specific ^integration into the genome^ TWs is prefeitcd to plasmidic 
transformation for reasons of genetic stability and maintenance of the transforming DNA. 

If an entire ceUulose degradative pathway were introduced, a selection could be 
implemented in Wagar-plate-based format, and a large number of clones could be assayed for 
cellulase activity in a short period of time. For example, selection for an exocellulase may be 
accessible by providing a soluble oligocellulose substrate or carboxymethylcellulose (CMC) as 
a sole carbon source to the host, otherwise unable to grow on agar containing this sole carbon 
source. Clones producing active cellulase pathways would grow by virtue of their ability to 
produce glucose. . . .. 

. - . . Alternatively, ifthe different cellulases were to be introduced sequentially, it 

,■- would be useful to first^ntroduce a cellobiase, enablmg?a;seiection,nsin g «J^a ^ -» ^ ^ 

20 available cellobiose as a sole carbon source. Several strains of S. cerevisiae that are able to 

grow on cellobiose have been created by introduction of a cellobiase gene (e.g. Rajoka Ml, et. 
al. (1998) Floia Microbiol (Praha) 43, 129-135; Skory, CD, et. al. (1996) Onr^GeneL 30, 
417-422; D'Auria, S, et. al. (1996) Apol. Biochem. Biotechnni «i i ^li a*. ac> ^ 

al. (1995) Yeast 11, 395-406; Adamj AC (1991) Curr. Genet Til S-R) 1 

Subsequent transformation of this I organism with CBHI exoceUulase canbe 
selected for by growth on a cellulose substrate such as carboxymethylcellulose (CMC). 
Finally, addition of an endoglucanase creates a yeast strain with improved crystalline 
degradation capacity. 
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24. Conferring of Pentose Sugar T Triliyntirm . 
Inclusion of pentose- sugar utilization pathways is an iii^ttant'^cet to a 
potentially useful SSF organism! The successful expression of xylose sugar utilization 
pathways for ethanol production has been reported in Saccharomyces (e.g/Chen, ZD and Ho, 
NWY (1993) Appl. Bi ochem. Biotechnni 3Q/4n i-x^^ 
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It would also be useful to accomplish L-arabinose substrate utilization for 
ethanol production in the Saccharomyces host. Yeast strains that utilize L-arabinose include 
some Candida and Pichia spp. (McMillan JD and Boynton BL (1994) Appl. Biochem 
Biotechnol. 45-46: 569-584; Dien BS, et al. (1996) Appl. Biochem. Biotechnol. 57-58; 233- 
5 242). Genes necessary for arabinose fermentation in K coli could also be inroduced by 

rational means (e.g. as has been performed previously in Z mobilis (Deanda K, et. al. (1996) 
A ppl. Environ. Microbiol. 62: 4465-4470)) 

; 25. Conferring of Other Useful Activities 

Several other traits that are important for optimization of an SSF strain have 

10 been shown to be transferable to S. cerevisiae. Like thermal tolerance, cellulase activity and 
pentose sugar utilization, these traits may not normally be exhibited by Saccharomyces (or the 
particular strain of Saccharomyces being used as a host), and may be added by genetic means. 
For example, expression of human muscle acylphosphatase in S. cerevisiae has been suggested 
to increase ethanol production (Rougei, G., et. al. (1996) Biotechnol Aop . Biochem 23: 273- 

15 278). 

It can occur that evolved stress-tolerant SSF strain acquire some undesirable 
mutations in the course of the evolution strategy. Indeed, this is a pervasive problem in strain 
improvement strategies that rely on mutagenesis techniques, and can result in highly unstable 
or fragile production strains. It is possible to restore some of these desirable traits by rational 

20 methods such as cloning of specific genes that have been knocked out or negatively influenced 
in the previous rounds of strain improvement. The advantage to this approach is specificity- 
the offending gene may be targeted directly The disadvantage is that it may be time- 
consuming and repetitious if several genes have been compromised, and it only addresses 
problems that have been characterized. A preferred (and more traditional) approach to the 

25 removal of undesirable/deleterious mutations is to back-cross the evolved strain to a desirable 
parent strain (e.g. the original 'Host" SSF strain). This strategy has been employed 
successfully throughout strain improvement where accessible (i.e. for organisms that have 
sexual cycles of reproduction). When lacking the advantage of a sexual process, it has been 
accomplished by using other methods, such as parasexual recombination or protoplast fusion. 

30 For example, the ability to flocculate was conferred on a non-flocculating strain of S. 

cerevisiae by protoplast fusion with a flocculation competent S. cerevisiae (Watari, J., et. al 
(1990) Agric. Biol. Chem. 54: 1677-1681). 
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N . I N VTTRO WHOLE ORN QMR SHTTFFT ttm^ 
The shuffling of large DNA sequences, such as eukaryotic chrmosomes is 
difficult by prior art in vitro shuffling methods. A method for overcoming this limitation is 
described herein. 

The cells of related eukaryotic species are gently lysed and the intact 
chromosomes are liberated. The liberated chromosomes are then sorted by FACS or similar 
method (such as pulse field electrophoresis) with chromosomes of similar size being 
sequestered together. Each size fractidn of the sorted chromosomes generally wil, represent a 
pool of analogous chromosomes, for example the Y chromosome of related mammals The i 
goal is to isolate intact chromosomes that have not been irreversibly damaged. 

The fragmentation and reassembly of such large complex pieces of DNA 
employing DNA polymerases is difficult and would likely introduce an unacceptably high level 
of random mutations. An alternative approach that employs restriction enzymes and DNA 
hgase provides a feasible less destructive solution. A chromosomal fraction is digested with 
15 one or more restriction enzymes that recognize long DNA sequences (~15-20b P ), such as the 
mtron and mtein encoded endonucleases {l-Ppo I, l-Ceu I, Vl-Psp L PI-77/ 1, Vl-Sce I (VDE) 
These enzymes each cut, at most, a few times within each chromosome, resulting in a 
combinatorial mixture of large fragments, each having overhanging single stranded termini that 
are complementary to other sites cleaved by the same enzyme. 

exonuclease. The polarity of the nuclease chosen is dependent on the single stranded 
overhang resulting from the restriction enzyme chosen. S'-3' exonuclease for 3 '-overhangs 
and 3'-5'- exonuclease for 5'overhangs. This digestion results in sigmficantly long regions of 
ssDNA overhang on each dsDNA termini. The purpose of this incubation is to generate 
25. regions of DNA that define specific regions of DNA where recombination can occur The 
fragments are then incubated under condition where the ends of the fragments anneal with 
other fragments having homologous ssDNA termini. Often, the two fragments annealing will 
have originated from different chromosomes and in the presence of DNA ligase are covalently 
linked to form a chimeric chromosome.. This generates genetic diversity mimicking the 
30 crossing over of homologous chromosomes. The complete ligation reaction will contain a 

' C ° mbinat0rid 

termini. A subset of this population will be complete chimeric chromosomes 

To screen the shuffled library, the chromosomes are delivered to a suitable host 
in a marmer allowing for the uptake and expression of entire chromosomes. For example 
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YACs (yeast artificial chromosomes) can be delivered to eukaryotic cells by protoplast fusion. 
Thus, the shuffle library could be encapsulated in liposomes and fused with protoplasts of the 
appropriate host cell. The resulting transformants would be propagated and screened for the 
desired cellular improvements. Once an improved population was identified, the 
5 chromosomes would be isolated, shuffled, and screened recursively. 

O. WHOLE GENOME SHUFFLING OF NATURALLY COMPETENT 
MICROORGANISMS 

Natural competence is a phenomenon observed for some microbial species 
whereby individual cells take up DNA from the environment and incorporate it into their 
10 genome by homologous recombination. Bacillus subtilis and Acetinetobacter spp. are known 
to be particularly efficient at this process. A method for the whole genome shuffling (WGS) 
of these and analogous organisms is described employing this process. 

One goal of whole genome shuffling is the rapid accumulation of useful 
mutations from a population of individual strains into one superior strain. If the organisms to 
15 be evolved are naturally competent, then a split pooled strategy for the recursive 

transformation of naturally competent cells with DNA originating from the pool will effect this 
process. An example procedure is as follows. 

A population of naturally competent organisms that demonstrates a variety of f 
. useful traits (such as increased protein secretion) is identified. The strains are pooled, and the 
20 pool is split. One half of the pool is used as a source of gDNA, while the other is used to 
generate a pool of naturally competent cells. ■ * " 

The competent cells are grown in the presence of the pooled gDNA to allow 
DNA uptake and recombination. Cells of one genotype uptake and incorporate gDNA from 
cells of a different type generating cells having chimeric genomes. The result is a population 
25 of cells representing a combinatorial mixture of the genetic variations originating in the 

original pool. These cells are pooled again and transformed with the same source of DNA 
again. This process is carried out recursively to increase the diversity of the genomes of cells 
resulting from transformation. Once sufficient diversity has been generated, the cell 
population is screened for new chimeric organisms demonstrating desired improvements. 
30 This process is enhanced by increasing the natural competence of the host 

organism. COMS is a protein that, when expressed in B. subtilis, enhances the efficiency of 
natural competence mediated transformation more than an order of magnitude. 

It was demonstrated that approximately 100% of the cells harboring the 
plasmid pCOMS uptake and recombine genomic DNA fragments into their genomes. In 
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general, approximately 10% of the genome is recombined into any given transfonned cell. This 
observation was demonstrated by the following. 

A strain of B. subtilis pCOMS auxotrophic for two nutritional markers was 
transformed with genomic DNA (gDNA) isolated from a prototrophic strain of the same 
organism. 10% of the cells exposed to the DNA were prototrophic for one of the two nutrient 
marker,. The average size of the DNA strand taken up byB. subtilis is approximately 50kb or 
-2% of the genome. Thus 1 of every ten cells had recombined a marker that was represented 
1 m every fifty molecules of uptaken gDNA. Thus, most of the cells take up and recombine 
with approximately five 50kb molecules or 10% of the genome. This method represents a 
powerful tool for rapidly and efficiently recombining whole microbial genomes. 

In the absence of pCOMS, only 0.3% of the cells prepared fornatural 
competency uptake and integrate a specific marker. This suggested that about 15% of the 
cells actually underwent recombination with a single genomic fragment. Thus, a recursive 
transformation strategy as described above produces a Whole genome shuffled library even in 
the absence of pCOMS. In the absence of pCOMS, however, the complex genomes will 
represent a smaller, but still screenable percentage of the transformed or shuffled population. 
P. CONGRESSION 

Congression is the integration of two independent un^ed'markers into a cell 

w these, about 10/ o have taken up an additional marker Thus, if one selects or screens for the 
integration of one specific marker, 10% : 0 f. the resulting population will have integrated 
another specific marker. This provides a way of enriching for specific integration events. 

For example, if one i s looking for the integration of a gene for which there is no 
easy screen or selection, it will exist as 0.3% of the cell population. If the population is first 
selected for a specific integration event, then the desired integration will be found in 10% of 
the population: This represents a significant (~30-fold) enrichment for the desired event. This 
enrichment is defines as the "congression effect." The congression effect is not influenced by 
the presence of pCOMS, thus the "pCOMS effect" is simply to increase the percentage of 
naturally competent cells that are truly naturally competent from about 1 5% in its absence to 
100% in its presence. All competent cells still uptake about the same amount of DNA or 
-10% of the Bacillus genome. 

The congression effect can be used in the following examples to enhance whole 
genome shuffling as well, as the targeted integration of shuffled genes to the chromosome. 
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O. B. SUBTILIS SHUFFLING 

A population of B. subtilis cells having desired properties are identified, pooled 
and shuffled as described above with one exception: once the pooled population is split, half of 
the population is transformed with an antibiotic selection marker that is flanked by sequence 
5 that targets its integration and disruption of a specific nutritional gene, for example, one 
involved in amino biosynthesis. Transformants resistant to the drug are auxotrophic for that 
nutrient. The resistant population is pooled and grown under conditions rendering them 
naturally competent (or optionally first transformed with pCOMS). 

The competent cells are then transformed with gDNA isolated from the original 
10 pool, and prototrophs are selected. The prototrophic population will have undergone 

recombination with genomic fragments encoding a functional copy of the nutritional marker, 
and thus will be enriched for cells having undergone recombination at other genetic loci by the 
congression effect. 

R. TARGETING OF GENES AND GENE LIBRARIES TO THE 
15 CHROMOSOME 

It is useflil to be able to efficiently deliver genes or gene libraries directly to a 
specific location in a cells chromosome. As above, target cells are transformed with a positive 
selection marker flanked by sequences that target its homologous recombination into the 

20 chromosome. Selected cells harboring the marker are made naturally competent (with or 
without pCOMS, but preferably the former) and transformed with a mixture of two sets of 
DNA fragments. The first set contains a gene or a shuffled library of genes each flanked with 
sequence to target its integration to a specific chromosomal loci. The second set contains a 
positive selection marker (different from that first integrated into the cells) flanked by 

25 sequence that will target its integration and replacement of the first positive selection marker. 
Under optimal conditions, the mixture is such that the gene or gene library is in molar excess 
over the positive selection marker. Transformants are then selected for cells containing the 
new positive marker. These cells are enriched for cells having integrated a copy of the desired 
gene or gene library by the congression effect and can be directly screened for cells harboring 

30 the gene or gene variants of interest. This process was carried out using PCR fragments 

<10kb, and it was found that, employing the congression effect, a population can be enriched 
such that 50% of the cells are congregants. Thus, one in two cells contained a gene or gene 
variant. 
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Alternatively, the expression host can be absent of the first positive selection 
marker, and the competent cells are transformed with a mixture of the target genes and a 
limiting amount of the first positive selection marker fragment Cells selected for the positive 
marker are screened for. the desired properties in the targeted genes. The improved genes are 
> amplified by the PGR, shuffled again, and then returned to the original host again with the first 
pos,tive selection marker. This process is. carried out recursively until the desired function of 
thegenesareobtained. This process obviates the need ^construct a primary host strain and 
the need for two positive markers. 

S. CONJTJGATION-MFnrA T ED GFNKTrr EXCHANPB 

Conjugation can be employed in the evolution of cell genomes in several ways 
Conjugative transfer of DNA occurs during contact between cells. See Guiney (1993) in 
Bacterial Conjugation (Clewell, ed., Plenum Press, New York), pp. 75-104; Reimmann & 
Haas m Bacterial Conjugation (dwelled., Plenum Press, New York 1993) atpp 137 1 88 
(incorporated by reference in their entirety for all purposes). Conjugation occurs between 
many types of gram negative bacteria, and some types of gram positive bacteria. Conjugative 
transfer ,s also known between bacteria and plant cells (Agrobacterium tumefaciens) or yeast 
As discussed in patent 5,837,458, the genes responsible for conjugative transfer can 
themselves be evolved to expand the range .of cell types (e.g:, from bacteria to mammals) 
wA^^en which such transfercan .occur-- , v - - 

Conjugative transfer is effected by an origin of transfer (oriT) and flanking 
genes (MOB A, B and C), and 15-25 genes, tenned tra, encoding the structures and enzymes 
necessary for conjugation to occur. The transfer origin is defined as the site required in cis for 
DNA transfer. Tra genes include tra A, B, C, D, E, F, G, H, I, J, K, L, M, N, P, Q, R, S T 
U,V,W, X, Y,Z,virAB(aUeles l-llXC,D,E, G,^ Tra genes can be' ' 

expressed in cis or trans to oriT. Other ceUular enzymes, including those of the RecBCD 
pathway, RecA, SSB protein, DNA gyrase, DNA poll, and DNA ligase, are also involved in 
conjugative transfer. RecE or recF pathways can substitute for RecBCD 

One structural protein encoded by a tra gene is the sex pilus, a filament 
constructed of an aggregate of a single polypeptide protruding from the cell surface.. The sex 
pilus binds to a polysaccharide on recipient cells and forms a conjugative bridge through which 
DNA can transfer. This process activates a site-specific nuclease encoded by a MOB gene 
which specifically cleaves DNA to be transferred at oriT. The cleaved DNA is then threaded 
through the conjugation bridge by the action ofother tra enzymes: 
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Mobilizable vectors can exist in episomal form or integrated into the 
chromosome. Episomal mobilizable vectors can be used to exchange fragments inserted into 
the vectors between cells. Integrated mobilizable vectors can be used to mobilize adjacent 
genes from the chromosome. 

5 T. USE OF INTEGRATED MOBILIZABLE VECTORS TO PROMOTE 

EXCHANGE OF GENOMIC DNA 

The F plasmid of E. coli integrates into the chromosome at high frequency and 
mobilizes genes unidirectional from the site of integration (Clewell, 1993, supra; Firth et al., 
in Escherichia coli curd Salmonella Cellular and Molecular Biology 2, 2377-2401 (1996); 

10 Frost et al., Microbiol. Rev. 58, 162-210 (1994)). Other mobilizable vectors do not 

spontaneously integrate into a host chromosome at high efficiency, but can be induced to do 
so by growth under particular conditions (e.g., treatment with a mutagenic agent, growth at a 
nonpermissive temperature for plasmid replication). See Reimann & Haas in Bacterial 
Conjugation (ed. Clewell, Plenum Press, NY 1993), Ch. 6. Of particular interest is the IncP 

15 group of conjugal plasmids which are typified by their broad host range (Clewell, 1993, supra. 

Donor "male" bacteria which bear a chromosomal insertion of a conjugal 
plasmid, such as the E. coli F factor can efficiently donate chromosomal DNA to recipient 
"female" enteric bacteria which lack F (F"). Conjugal transfer from donor to recipient is 
initiated at oriT. Transfer of the nicked single strand to the recipient occurs in a 5 1 to 3' 

20 direction by a rolling circle mechanisms which allows mobilization of tandem chromosomal 
copies. Upon entering the recipient, the donor strand is discontinuously replicated. The 
linear, single-stranded donor DNA strand is a potent substrate for initiation of rec>V-mediated 
homologous recombination within the recipient. Recombination between the donor strand and 
recipient chromosomes can result in the inheritance of donor traits. Accordingly, strains which 

25 bear a chromosomal copy of F are designated Hfr (for high frequency of recombination) (Low, 
1996 in Escherichia coli and Salmonella Cellular and Molecular Biology Vol. 2, pp. 2402- 
2405; Sanderson, in Escherichia coli and Salmonella Cellular and Molecular Biology 2, 
2406-2412(1996)). 

The ability of strains with integrated mobilizable vector to transfer 

30 chromosomal DNA provides a rapid and efficient means of exchanging genetic material 
between a population of bacteria thereby allowing combination of positive mutations and 
dilution of negative mutations. Such shuffling methods typically start with a population of 
strains with an integrated mobilizable vector encompassing at least some genetic diversity. 
The genetic diversity can be the result of natural variation, exposure to a mutagenic agent or 
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are mutagenized with a Tn-driT element and screened for acquisition of an auxotrophy (e.g., 
by replica-plating to a minimal and complete media) resulting from insertion of the Tn-oriT 
element in any one of many biosynthetic gene scattered across the genome. The resulting 
auxotrophs are pooled and allowed to mate under conditions promoting male-to-male matings, 
5 e.g., during growth in close proximity on a filter membrane. Note that transfer functions are 
provided by the helper conjugal plasmid present in the original strain set. Recombinant 
transconjugants are selected on minimal medium and screened for further improvement. 

Optionally, strains bearing integrated mobilizable vectors are defective in 
mismatch repair gene(s). Inheritance of donor traits which arise from sequence heterologies 

10 increases in strains lacking the methyl-directed mismatch repair system. Optionally, the gene 
products which decrease recombination efficiency can be inhibited by small molecules. 

Intergenic conjugal transfer between species such as K coli and Salmonella 
typhimurium, which are 20% divergent at the DNA level, is also possible if the recipient strain 
is mutH, mutL or mutS (see Rayssiguier et al., Nature 342, 396-401 (1989)). Such transfer 

15 can be used to obtain recombination at several points as shown by the following example. 

One example uses an S. typhimurium Hfr donor strain having markers thr557 at 
map position 0, pyrF2690 at 33 min, serA13 at 62 min and hfrK5 at 43 min. MutS +/-, F- E. 
coli recipient strains had markers pyrD68 at 21 min aroC355 at 51 min, ilv3164 at 85 min and 
mutS215 at 59 min. The triauxotrophic £ typhimurium Hfr donor and isogenic mutS+/- 

20 triauxotrophic E. coli recipient were inoculated into 3 ml of Lb broth and shaken at 37°C until 
fully grown. 100 pi of the donor and each recipient were mixed in 10 ml fresh LB broth, and 
then deposited to a sterile Millipore 0.45 pM HA filter using a Nalgene 250 ml reusable 
filtration device. The donor and recipients alone were similarly diluted and deposited to check 
for reversion. The filters with cells were placed cell-side-up on the surface of an LB agar plate 

25 which was incubated overnight at 37°C. The filters were removed with the aid of a sterile 
forceps and placed in a sterile 50 ml tube containing 5 ml of minimal salts broth. Vigorous 
vortexing was used to wash the cells from the filters. 100 pi of mating mixtures, as well as 
donor and recipient controls were spread to LB for viable cell counts and minimal glucose 
supplemented with either two of the three recipient requirements for single recombinant 

30 counts, one of the three requirements for double recombinant counts, or none of the three 

requirements for triple recombinant counts. The plates were incubated for 48 hr at 37° after 
which colonies were counted. J 
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■ . . The data indicate that recombinants can be generated at reasonable frequencies 

20 usmgHfrmatings. Intergeneric recombination is enhanced 1 00-200 fold in a recipient that is 

Frequencies are further enhanced by increasing the ratio of donor to recipient 
cells, or by repeatedly mating the original donor strains with the previously generated 
recombinant progeny. 

y i f^^^^l 1CTTON OF FF <\ GMRNTS R V rwKTTT tr* ^ YWttJ 
Sobdizable vectors can also be used to transfer fragment libraries into cells to 
be evolved. Tfos approach is particularly useful in situations in which the ceUs to be evolved 
cannot be efficiently transformed directly with the fragment library but can undergo 
conjugation with primary cells that can be transformed with the fragment library. 

DNA fragments to be introduced into host cells encompasses diversity relative 
to the host cell genome. The diversity can be. the result of natural diversity or mutagenesis 
The DNA fragment library is cloned into a mobilise vector having an origin of transfer 
Some such vectors also contain mobgenes although alternatively these functions can also be 
prided in trans. The vector should be capable of efficient conjuga, transfer between primary 
cells and the intended host cells. The vector should also confer a selectable phenotype This 
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phenotype can be the same as the phenotype being evolved or can be conferred by a marker, 
such as a drug resistance marker. The vector should preferably allow self-elimination in the 
intended host cells thereby allowing selection for cells in which a cloned fragment has 
undergone genetic exchange with a homologous host segment rather than duplication. Such 
5 can be achieved by use of vector lacking an origin of replication functional in the intended host 
type or inclusion of a negative selection marker in the vector. 

One suitable vector is the broad host range conjugation plasmid described by 
Simon et al., Bio/Technology 1, 784-791 (1983); TrieuCuot et a!., Gene 102, 99-104 (1991); 
Bierman et al., Gene 116, 43-49 (1992). These plasmids can be transformed into E. coli and 

10 then force-mated into bacteria that are difficult or impossible to transform by chemical or 
electrical induction of competence. These plasmids contain the origin of the IncP plasmid, 
oriT. Mobilization functions are supplied in trans by chromosomally-integrated copies of the 
necessary genes. Conjugal transfer of DNA can in some cases be assisted by treatment of the 
recipient (if gram-positive) with sub-inhibitory concentrations of penicillinsXTrieu-Cuot et al., 

15 1993 FILMS Microbiol Lett. 109,19-23). To increase diversity in populations, recursive 
conjugal mating prior to screening is performed. 

Cells that have undergone allelic exchange with library fragments can be 
screened or selected for evolution toward a desired phenotype. Subsequent rounds of 
recombination can be performed by repeating the conjugal transfer step, the library of 

20 fragments can be fresh or can be obtained from some (but not all) of the cells surviving a 

previous round of selection/screening. Conjugation-mediated shuffling can be combined with 
other methods of shuffling. 

V. GENETIC EXCHANGE PROMOTED BY TRANSDUCING PHAGE 
Phage transduction can include the transfer, from one cell to another, of 

25 nonviral genetic material within a viral coat (Masters, in Escherichia coli and Salmonella 
Cellular andMolecular Biology 2, 2421-2442 (1996). Perhaps the two best examples of 
generalized transducing phage are bacteriophages PI and P22 of E. coli and S* typhimurium, 
respectively. Generalized transducing bacteriophage particles are formed at a low frequency 
during lytic infection when viral-genome-sized, doubled-stranded fragments of host (which 

30 serves as donor) chromosomal DNA are packaged into phage heads. Promiscuous high 
transducing (HT) mutants of bacteriophage P22 which efficiently package DNA with little 
sequence specificity have been isolated. Infection of a susceptible host results in a lysate in 
which up to 50% of the phage are transducing particles. Adsorption of the generalized 
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transducing particle to a susceptible recipient cell results in the injection of the donor 
chromosomal fragment. RecA-mediated homologous recombination following injection of the 
donor fragment can result in the inheritance of donor traits. Another type of phage which 
achieves quasi random insertion of DNA into the host chromosome is Mu. For an overview of 
Mu biology, see, Groisman (1991) in Methods in Rn^nlnpy v . 204. Mu can generate a 
variety of chromosomal rearrangements including deletions, inversions, duplications and 
transpositions. In addition, elements which combine the features of P22 and Mu are available 
including Mud-P22, which contains the ends of the Mu genome in place of the V22 att site and 
int gene. See, Berg, supra. 

Generalized transducing phage can be used to exchange genetic material 
between a population of cells encompassing genetic diversity and susceptible to infection by 
the phage. Genetic diversity can be the result of natural variation between cells, induced 
mutation of cells or the introduction of fragment libraries into cells. DNA is then exchanged 
between cells by generalized transduction. If the phage does not cause lysis of cells, the entire 
population of cells can be propagated in the presence of phage. If the phage results in lytic 
infection, transduction is performed on a split pool basis. That is, the starting population of 
cells is divided into two. One subpopulation is used to prepare transducing phage. The 
transducing phage are then infected into the other subpopulation. Preferably, infection is 
performed at high multiplicity of phage per cell so that few cells remain uninfected Cells 

property. The pool of cells surviving screening/selection can then be shuffled by a further 
round of generalized transduction or by other shuffling methods. Recursive split pool 
tranduction is optionally performed prior to selection to increase the diversity of any 
population to me screened. 

The efficiency of the above methods can be increased by reducing infection of 
cells by infectious (nontransducing phage) and by reducing lysogen formation. The former can 
be achieved by inclusion of chelators of divalent cations, such as citrate and EGTA in culture 
media. Tail defective transducing phages can be used to allow only a single round of infection. 
Divalent cations are required for phage absorption and the inclusion of chelating agents 
therefore provides a means of preventing unwanted infection Integration defective {inf) 
derivatives of generalized transducing phage can be used to prevent lysogen formation. In a 
further variation, host cells with defects in mismatch repair gene(s) can be used to increase 
recombination between transduced DNA and genomic DNA. 
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1 : Use of Locked in Prophages to Facilitate DNA Shuffling 
The use of a hybrid, mobile genetic element (locked-in prophages) as a means 

to facilitate whole genome shuffling of organisms using phage transduction as a means to 

transfer DNA from donor to recipient is a preferred embodiment. One such element 

5 (Mud-P22) based on the temperate Salmonella phage P22 has been described for use in 

genetic and physical mapping of mutations. See, Youderian et al. (1988) Genetics 1 18:581- 

592, and Benson and Goldman (1992) J. BacterioL 1 74(5): 1673-1 68 1 ! Individual Mud-P22 

insertions package specific regions of the Salmonella chromosome into phage P22 particles. 

Libraries of random Mud-P22 insertions can be readily isolated and induced to create pools of 

10 phage particles packaging random chromosomal DNA fragments. These phage particles can 
be used to infect new cells and transfer the DNA from the host into the recipient in the process 
of transduction. Alternatively, the packaged chromosomal DNA can be isolated and 
manipulated further by techniques such as DNA shuffling or any other mutagenesis technique 
prior to being reintroduced into cells (especially recD cells for linear DNA) by transformation 

15 or electroporation, where they integrate into the chromosome. 

Either the intact transducing phage particles or isolated DNA can be subjected 
to a variety of mutagens prior to reintroduction into cells to enhance the mutation rate. 
Mutator cell lines such as mutD can also be used for phage growth. Either method can be 
used recursively in a process to create genes or strains with desired properties. E. coli cells 

20 carrying a cosmid clone of Salmonella LPS genes are infectable by P22 phage. It is possible to 
develop similar genetic elements using other combinations of transposable elements and 
bacteriophages or viruses as well. 

P22 is a lambdoid phage that packages its DNA into preassembled phage 
particles (heads) by a "headful" mechanism. Packaging of phage DNA is initiated at a specific 

25 site (pad) and proceeds unidirectionally along a linear, double stranded normally concatameric 
molecule. When the phage head is full (-43 fcb), the DNA strand is cleaved, and packaging of 
the next phage head is initiated. Locked-in or excision-defective P22 prophages, however, 
initiate packaging at their pac site, and then proceed unidirectionally along the chromosome, 
packaging successive headfuls of chromosomal DNA (rather than phage DNA). When these 

30 transducing phages infect new Salmonella cells they inject the chromosomal DNA from the 
original host into the recipient cell, where it can recombine into the chromosome by 
homologous recombination creating a chimeric chromosome. Upon infection of recipient cells 
at a high multiplicity of infection, recombination can also occur between incoming transducing 
fragments prior to recombination into the chromosome. 
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Integration of such locked-in P22 prophages at various sites in the 
chromosome allows flanking regions to be amplified and packaged into phage particles. The 
Mud-P22 mobile genetic element contains an excision-defective P22 prophage flanked by the 
endsofphage/transposonMu The entire Mud-P22 element can transpose to virtually any 
location in the chromosome or other episome (eg. F\ B AC clone) when the Mu A and B 
proteins are provided in trans. 

A number of embodiments for this type of genetic element are available Inone 
example, the locked in prophage are used as generalized transducing phage to transfer random 
fragments of a donor chromosome into a recipient. The Mud-P22 element acts as a 
transposon when Mu A and B transpose proteins are provided in trans and integrate copies 
of itself at random locations in the chromosome. In this way, a library of random 
• chromosomal Mud-P22 insertions can be generated ma suitable host: When the Mud-P22 ' 
prophages in this library are induced, random fragments of chromosomal DNA will be 
packaged into phage particles. When these phages infect recipient cells, the chromosomal 
15 DNA is injected and can recombine into the chromosome of the recipient. These recipient 
cells are screened for a desired property and cells showing improvement are then propagated 
The process can be repeated, since the Mud-P22 genetic element is not transferred to the 
recipient in this process. Infection at a high multiplicity allows for multiple chromosomal 
fragments to be injected and recombined into the recipient chromosome. 

Individual insertions near a gene of interest can be isolated from a random insertion library by 
a variety of methods. Induction of these specific prophages results in packaging of flanking 
chromosomdDNAmcludmgthegene(s)ofmterest^ Infection of 

recipient cells with these phages and recombination of the packaged DNA into the 
25 chromosome creates chimeric genes that can be screened for desired properties. Infection at a 
high multiplicity of infection can allow recombination between mcoming transducing 
fragments prior to recombination into the chromosome. 

These specialized transducing phage can also be used to isolate large quantities 
of high quality DNA containing specific genes of int erest without any prior knowledge of the 
30 DNA sequence. Cloning of specific genes is not required. Insertion of such an element nearby 
a biosynthetic operon for example allows for large amounts of DNA from that operon to be 
isolated for use in DNA shuffling (/„ vitro and/or in vivo), cloning, sequencing; or other uses 
as set forth herein. DNA isolated from similar insertions in other organisms containing 



100 



BNSDOCID: <WO _0004190A1 JA> 



WO 00/04190 PCT/US99/I5972 
homologous operons are optionally mixed for use in family shuffling formats as described 
herein, in which homologous genes from different organisms (or different chromosomal 
locations within a single species, or both). Alternatively, the transduced population is 
recursively transduced with pooled transducing phage or new transducing phage generated 
5 from the previously transduced cells. This can be carried out recursively to optimize the 
diversity of the genes prior to shuffling. 

Phage isolated from insertions in a variety of strains or organisms containing 
homologous operons are optionally mixed and used to coinfect cells at a high MOI allowing 
for recombination between incoming transducing fragments prior to recombination into the 
10 chromosome. 

Locked in prophage are useful for mapping of genes, operons, and/or specific 
mutations with either desirable or undesirable phenotypes. Locked-in prophages can also 
provide a means to separate and map multiple mutations in a given host. If one is looking for 
beneficial mutations outside a gene or operon of interest, then an unmodified gene or operon 
15 can be transduced into a mutagenized or shuffled host then screened for the presence of 

desired secondary mutations. Alternatively, the gene/operon of interest can be readily moved 
from a mutagenized/shuffled host into a different background to screen/select for 
modifications in the gene/operon itself. 

It is also possible to develop similar genetic elements using other combinations 
20 of transposable elements and bacteriophages or viruses as well. Similar systems are set up in 
other organisms, e.g., that do not allow replication of P22 or PI. Broad host range phages 
and transposable elements are especially useful. Similar genetic elements are derived from 
other temperate phages that also package by a headfiil mechanism. In general, these are the 
phages that are capable of generalized transduction. Viruses infecting eukaryotic cells may be 
25 adapted for similar purposes. Examples of generalized transducing phages that are useful are 
described in: Green et al. y "Isolation and preliminary characterization of lytic and lysogenic 
phages with wide host range within the streptomycetes", X Gen Microbiol 13 1(9):2459-2465 
(1985); Studdard etal, "Genome structure in Streptomyces spp.: adjacent genes on the S. 
coelicolor A3 (2) linkage map have cotransducible analogs in S. venezuelge n y J. Bacteriol 
30 169(8):3814-3816 (1987); Wang etal., "High frequency generalized transduction by miniMu 
plasmid phage", Genetics 1 16(2):20 1-206, (1987); Welker, N. E., "Transduction in Bacillus 
stearothermophilus n y J. Bacteriol, 176(1 1):3354-3359, (1988); Darzins e/^/.,,"Mini-D31 12 
bacteriophage transposable elements for genetic analysis oi Pseudomonas aeruginosa, J. 
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Bacterial 171(7):3909-3916 (1989); Hugouvieux-Cotte-Pattat eta!., "Expanded linkage map 
of Erwiniachrysanthemi strain 3937", Mol Microbiol 3(S):573-S81, (1989); Ichigee/a/. 
"Establishment of gene transfer systems for and construction of the genetic map of a marine 
Vibrio strata^. 2^^^^ 
5 t^ucmgph^ 

35(12): 1073-1084 (1991); Regue et al., "A generalized transducing bacteriophage for Serratia 
marcescens", Res Microbiol 42(l):23-27, (1991); Kiesel e / a/., "Phage Acml -mediated 
transduction in the facultatively methanol-utilizing ^cetotor methanolicusMB 58/4" J. 
Gen P/ro/ 74(9): 174 1-1 745 (1993); Blahovae/a/., "Transduction of imipenem resistance by 
10 the phage F-l 16 from a nosocomial strain of Pseudomonas aeruginosa isolated in Slovakia" 
Acta Virol 38(5):247-250.(1994); Kidambi et al., "Evidence for phage-mediated gene transfer 
among Pseudomonas aeruginosa strains on the phylloplane", Appl Environ Microbiol 
60.(2)496-500 (1994); Weiss et a/., "Isolation and characterization of a generalized 
transducing phage for Xanthomonas campestris pv. campestris", J. Bacterial 176(1 1):3354- 
15 3359 (1994); Matsumoto et al., "Clustering of the trp genes in Burkholderia (formerly 

Pseudomonas) cepacia", FEMS Microbial Lett 134(2-3):265-271 (1995); Schicklmaier et al., 
"Frequency of generalized transducing phages in natural isolates of the Salmonella 
typhimurium complex", Appl Environ Microbial 61(4): 61(4): 1637-1640 (1995); Humphrey 
. .. e ^^^ 0 " ^ transducing bacteriophage of. 

antibiotic resistance markers among Actinobacillus actinomycetemcomitans strains by 
temperate bacteriophages Aa phi 23", Cell Mol Life Sci 53(1 1-12):904-910 (1997); Jensen et 
al, "Prevalence of broad-host-range lytic bacteriophages of Sphaerotilusnatans, Escherichia 
coli, and Pseudomonas aeruginosa", Appl Environ M/croWo/ 64(2) 575-580 (1998), and 
25 Nedelmann et al., "Generalized transduction for genetic linkage analysis and transfer of 

transposon insertions in different Staphylococcus epidermidis strains", Zentiviralalbl Bakteriol 
287(l-2):85-92(1998). 

^ A Mud-Pl/Tn-Pl system comparable to Mud-P22 is developed using phage 
PI. Phage PI has an advantage of packaging much larger (~1 10 kb) fragments per headful. 
Phage PI is currently used to create bacterial artificial chromosomes or BAC's. Pl-based 
BAC vectors are designed, along these principles so that cloned DNA is packaged into phage 
particles, rather than the current system, which requires DNA preparation from single-copy 
episomes. This combines the advantages of both systems in having the genes cloned in a 
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stable single-copy format, whilst allowing for amplification and specific packaging of cloned 
DNA upon induction of the prophage. 

W. RANDOM PLACEMENT OF GENES OR IMP ROVED OKNTRg 
THROUGHOUT THE GENOME FOR OPTIMIZ ATION OF fiFNF 
5 CONTEXT 

The placement and orientation of genes in a host chromosome (the "context" of 

the gene in a chromosome) or episome has large effects on gene expression and activity. 

Random integration of plasmid or other episomal sequences into a host chromosome by 

non-homologous recombination, followed by selection or screening for the desired phenotype, 

10 is a preferred way of identifing optimal chromosomal positions for expression of a target! This 

strategy is illustrated in Fig. 18. 

A variety of transposon mediated delivery systems can be employed to deliver 

genes of interest, either individual genes, genomic libraries, or a library of shuffled gene(s) 

randomly throughout the genome of a host. Thus, in one preferred embodiment, the 

1 5 improvement of a cellular function is achieved by cloning a gene of interest, for example a 
gene encoding a desired metabolic pathway, within a transposon delivery vehicle. 

Such transposon vehicles are available for both Gram-negative and 
Gram-positive bacteria. De Lorenzo and Timis (1994) Methods in Enzvmology 235:385-404 
describe the analysis and construction of stable phenbtypes in gram-negative Bacteria with 

20 Tn5- and Tn 10-derived minitransposons. Kleckneretal (1991) Methods in Enzvmology 
204, chapter 7 describe uses of transposons such as TnlO, including for use in gram positive 
bacteria. Petit et al (1990) Journal of Bacteriology 172(1 2): 673 6-6740 describe TnlO, 
derived transposons active in Bacillus Subtilis. The transposon delivery vehicle is introduced 
into a cell population, which is then selected for recombinant cells that have incorporated the 

25 transposon into the genome. "■ • 

The selection is typically by any of a variety of drug resistant markers also 
carried within the transposon. The selected subpopulation is screened for cells having 
improved expression of the gene(s) of interest. Once cells harboring the genes of interest in 
the optimal location are isolated, the genes are amplified from within the genome using PGR, 

30 shuffled, and cloned back into a similar transposon delivery vehicle which contains a different 
selection marker within the transposon and lacks the transposon integrase gene. 

This shuffled library is then transformed back into the strain harboring the 
original transposon, and the cells are selected for the presence of the new resistance marker 
and the loss of the previous selection marker. Selected cells are enriched for those that have 
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exchanged by homologous recombination the original transpbson for the new transposon 
carrying members of the shuffled library. The surviving cells are then screened for further 
improvements in the expression ofthe desired phenotype. The genes from the improved cells 
are then amplified by the PGR and shuffled again. This process is carried out recursively 
oscillating each cycle between the different selection marker,. Once the gene(s) of interest are 
optimized to a desired level, the fragment can be amplified and again randomly distributed 
throughout the genome as described above to identify the optimal location ofthe improved 
- genes.- • ■ . , . , ... . * 

Alternatively, the gene(s) conferring a desired property may not be known. In 
this case the DNA fragments cloned within the transposon delivery vehicle could be a library 
of genomic fragments originating from a population of cells derived from one or more strains 
having the desired properties). The library is delivered to a population of cells derived from 
one or more strains having or lacking the desired properties) and cells incorporating the 
transposon are selected. The surviving cells are then screened for acquisition or improvement 
ofthe desired property: The fragments contained within the surviving cells are amplified by 
PCR and then cloned as a pool into a similar transposon delivery vector harboring adifferent 
selection marker from the first delivery vector. This library is then delivered to the pool of 
surviving cells, and the population having acquired the new selective marker is selected The 

phenotype are explored in a combinatorial fashion. This process is carried out repetitively 
with each new cycle employing an additional selection marker Alternatively, PCR fragments 
are cloned into a pool of transposon vectors having different selective markers. These are 
delivered to cells and selected for 1, 2, 3, or more markers. 

Alternatively, the amplified fragments from each improved cell are shuffled 
independently. The shuffled libraries are then cloned back into a transposon delivery vehicle 
similar to the original vector but containing a different selection marker and lacking the 
transpose gene. Selection is then for acquisition ofthe new marker and loss ofthe previous 
marker. Selected cells are enriched for those incorporating the shuffled variants of the 
amplified genes by homologous recombination. This process is carried out recursively, 
oscillating each cycle between the two selective markers. 
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X IMPROVEMENT OF OVEREXPRESSED GENE S FOR A DKSIRKn 
PHENOTYPE 

The improvement of a cellular property or phenotype is often enhanced by 
increasing the copy number or expression of gene(s) participating in the expression of that 
5 . property. Genes that have such an effect on a desired property can also be improved by DNA 
shuffling to have a similar effect. A genomic DNA library is cloned into an overexpression 
vector and transformed into a target cell population such that the genomic fragments are 
highly expressed in cells selected for the presence of the overexpression vector. The selected 
cells are then screened for improvement of a desired property. The overexpression vector 

10 from the improved cells are isolated and the cloned genomic fragments shuffled. The genomic 
fragment carried in the vector from each improved isolate is shuffled independently or with 
identified homologous genes (family shuffling). The shuffled libraries are then delivered back 
to a population of cells and the selected transformants rescreened for further improvements in 
the desired property. This shuffling/screening process is cycled recursively until the desired 

15 property has been optimized to the desired level. 

As stated above, gene dosage can greatly enhance a desired cellular property. 
One method of increasing gene copy number of unknown genes is using a method of random 
amplification (see also, Mavingui et. al. (1997) Nature Biotech, 15, 564). In this method, a 
genomic library is cloned into a suicide vector containing a selective marker that also at higher 

20 dosage provides an enhanced phenotype. An example of such a marker is the kanamycin 

resistance gene. At successively higher copy number, resistance to successively higher levels 
of kanamycin is achieved. The genomic library is delivered to a target cell by any of a variety 
of methods including transformation, transduction, conjugation, etc. Cells that have 
incorporated the vector into the chromosome by homologous recombination between the 

25 vector and chromosomal copies of the cloned genes can be selected by requiring expression of 
the selection marker under conditions where the vector does not replicate. This recombination 
event results in the duplication of the cloned DNA fragment in the host chromosome with a 
copy of the vector and selection marker separating the two copies. The population of 
surviving cells are screened for improvement of a desired cellular property resulting form the 

30 gene duplication event. Further gene duplication events resulting in additional copies of the 
original cloned DNA fragments can be generated by further propagating the cells under 
successively more stringent selective conditions i.e. increased concentrations of kanamycin. In 
this case selection requires increased copies of the selective marker, but increased copies of 
the desired gene fragment is also concomitant. Surviving cells are further screened for an 
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improvement in the desired phenotype. The resulting population, of cells likely resulted in the 
amplification of different genes since often many genes effect a given phenotype. To generate 
a library of the possible combinations of these genes, the original selected library showing 
phenotypic improvements are recombined, using the methods described herein, e.g.; protoplast 
fusion, split pool transduction, transformation, conjugationi etc. 

The recombined cells are selected for increased expression of the selective 
marker. Survivors are enriched for ceUs luvmg mcorporated additional copies^ 
sequence by homologous recombination, and these cells will be enriched for those having 
combined duplications of different genes. In other words, the duplication from one cell of 
enhanced phenotype becomes combined with the duplication of another cell of enhanced 
phenotype. These survivors are screened for further improvements in the desired phenotype 
This procedure is repeated recursively until the desired level of phenotypic expression is 
achieved. 

Alternatively, genes that have been identified or are suspected as being 
beneficial in increased copy number are cloned in tandem into appropriate plasmid vectors 
These vectors are then transformed and propagated in an appropriate host organism. 
Plasmid-plasmid recombination between the cloned gene fragments result in further 
- duplication ofthe genes. Resolution of the plasmid doublet can result in the uneven 
■ ^ 

haVmg feWer gene «>P^s. CeUs carrying this distribution of plasmids are then screened for an 

improvement in the phenotype effected by the gene duplications 

In summary, a method of selecting for increased copy number of a nucleic acid 

sequence by the above procedure is provided. In the method, a genomic library in a suicide 
vector comprising a dose-sensitive selectable marker is provided, as noted above The 
genomic library is transduced into a population of target cells. The target cells are selected in 
a population of target cells for increasing doses ofthe selectable marker under conditions in 
which the suicide vector does not replicate episomally. A plurality of target cells are selected 
for the desired phenotype, recombined and reselected. The process is recursively repeated, if 
desired, until the desired phenotype is obtained. 

Y. STRATEGIES FOR IMPR OVING OFN OMTC STTTTFrT rM/i VIA 

TRANSFORMATION OF T TNEAR DNA FR Zra^F™ 

Wild-type members ofthe Enterobacteriaceae (e.g., Escherichia coli) are 

typically resistant to genetic exchange following transformation of linear DNA molecules. 

This is due, at least in part, to the Exonuclease V (Exo V) activity ofthe RecBCD holoenzyme 
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which rapidly degrades linear DNA molecules following transformation. Production of ExoV 
has been traced to the recD gene, which encodes the D subunit of the holoenzyme. As 
demonstrated by Russel et al. (1989) Journal of Bacteriology 2609-2613, homologous 
recombination between a transformed linear donor DNA molecule and the chromosome of 
5 recipient is readily detected in a strains bearing a loss of function mutation in a recD mutant. 
The use of recD strains provides a simple means for genomic shuffling of the 
Enterobacteriaceae. For example, a bacterial strain or set of related strains bearing a recD 
null mutation (e.g., the K coli recDl 903 ::mini-Tet allele) is mutagenized and screened for the 
desired properties. In a split-pool fashion, Chromosomal DNA prepared on one aliquot could 

10 be used to transform (e.g., via electroporation or chemically induced competence) the second 
aliquot. The resulting transformants are then screened for improvement, or recursively 
transformed prior to screening. 

The use of RecE/ recT as described supra, can improve homologous 
recombination of linear DNA fragments. 

15 The RecBCD holoezyme plays an important role in initiation of 

RecA-dependent homologous recombination. Upon recognizing a dsDNA end, the RecBCD 
enzyme unwinds and degrades the DNA asymmetrically in a 5 7 to 3* direction until it 
encounters a chi (or "X")-site (consensus S'-GCTGGTGG-S') which attenuates the nuclease 
activity. This results in the generation of a ssDNA terminating near the c site with a 3 J -ssDNA 

20 tail that is preferred for RecA loading and subsequent invasion of dsDNA for homologous 

recombination Accordingly, preprocessing of transforming fragments with a 5' to 3' specific 
ssDNA Exonuclease, such as Lamda (X) exonuclease (available, e.g., from Boeringer 1 
Mannheim) prior to transformation may serve to stimulate homologous recombination in recD' 
strain by providing ssDNA invasive end for RecA loading and subsequent strand invasion. 

25 The addition of DNA sequence encoding chi-sites (consensus 

5 , -GCTGGTGG-3 > ) to DNA fragments can serve to both attenuate Exonuclease V activity 
and stimulate homologous recombination, thereby obviating the need for a recD mutation (see 
also, Kowalczykowski, et al (1994) "Biochemistry of homologous recombination in 
Escherichia coli? Microbiol Rev. 58:401-465 and Jessen, et al. (1998) "Modification of 

30 bacterial artificial chromosomes through Chi-stimulated homologous recombination and its 
application in zebrafish transgenesis." Proc. Natl. AcacL Set. 95:5121-5126). 

Chi sites are optionally included in linkers ligated to the ends of transforming 
fragments or incorporated into the external primers used to generate DNA fragments to be 
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transformed. The use of ^reeombinatibn-stimulatory sequences such as chi is a generally useful 
approach for evolution of a broad range of cell types by fragment transformation. 

Methods to inhibit or mutate analogs of Exo V or other nucleases (such as 
Exonucleases I (endAl), m (nth), IV info), VII, and Vm of £ coif) is similarly useful ' 
Inhibition or elimination of nucleases, or modification of ends of transfonning DNA fragments 
to render them resistant to exonuclease activity has applications in evolution of a broad ranee 
of cell types. .■. 

Z SHUFFLTNG TO OPTIMIZE UNKNOWN rKrrep a r^f Tf> m 
Many observed traits are the result of complex interactions ofmuTtiple genes 
gene products. Most such interactions are still uncharacterized. Accordingly, it is often 
unclear which genes need to be optimized to achieve a desired trait, even if some of the genes 
contributing to the trait are known. 

This lack of characterization is not an issue during DNA shuffling which 
produces solutions that optimize whatever is selected for. An alternative approach, which has 
15 the potential to solve not only this problem, but also anticipated future rate limiting factors, is 
complementation by overexpression of unknown genomic sequences. 

A library bf genomic DNA is first made as described, supra. This is 
transformed into the cell to be optimized and transformants are screened 
. _ ^noniic fragments which result in an improved property are evolved hv 1- 

information, nor any knowledge or assumptions about the nature of protein or pathway 
interactions, or even of what steps are rate -limiting; it relies only on detection of the desired 
phenotype. This sort of random cloning and subsequent evolution by DNA shuffling of 
positively interacting genomic sequences is extremely powerful and generic. A variety of 
25 sources of genomic DNA are used, from isogenic strains to more distantly related species with 
potentially desirable properties. In addition, the technique is applicable to any cell for which 
the molecular biology basics of transformation and cloning vectors are available, and for any 
property which can be assayed (preferably in a high-throughput fonnat). Alternatively, once 
optimized, the evolved DNA can be returned to the chromosome by homologous 
30 recombination or randomly by phage mediated site-specific recombination. 

AA. HOMOLOGOUS RECOMBINATION wtthtt n thf. rHRnMngnvrp 
Homologous recombination within the chromosome is used to circumvent the 
limitations of plasmid based evolution and size restrictions. The strategy is similar to that 
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described above for shuffling genes within their chromosomal context, except that no in vitro 
shuffling occurs. Instead, the parent strain is treated with mutagens such as ultraviolet light or 
nitrosoguanidine, and improved mutants are selected. The improved mutants are pooled and 
split. Half of the pool is used to generate random genomic fragments for cloning into a 
5 homologous recombination vector. Additional genomic fragments are optionally derived from 
related species with desirable properties. The cloned genomic fragments are homologously 
recombined into the genomes of the remaining half of the mutant pool, and variants with 
improved properties are selected. These are subjected to a further round of mutagenesis, 
selection and recombination. Again this process is entirely generic for the improvement of any 
10 whole cell biocatalyst for which a recombination vector and an assay can be developed. Here 
again, it should be noted that recombination can be performed recursively prior to screening. 

BB METHODS FOR RECURSIVE SEQUENCE RECOMBINATION 
Some formats and examples for recursive sequence recombination, sometimes 

referred to as DNA shuffling or molecular breeding, have been described by the present 

15 inventors and co-workers in copending application, attorney docket no. 16528A-014612, filed 

March 25, 1996, PCT/US95/02126 filed February 17, 1995 (published as WO 95/22625); 

Stemmer, Science 270, 1510 (1995); Stemmer et al., Gene, 164, 49-53 (1995); Stemmer, 

Bio/Technology, 13, 549-553 (1995); Stemmer, Proa Nail. Acad. Sci.USA 91, 10747-10751 

(1994); Stemmer, Nature 370, 389-391 (1994); Crameri et al., Nature Medicine, 2(1): 1-3, 

20 (1996), and Crameri et al., Nature Biotechnology 14, 315-319 (1996) (each of which is 

incorporated by reference in its entirety for all purposes). 

As shown in Figs. 16 and 17, DNA Shuffling provides most rapid technology 

for evolution of complex new functions. As shown in Fig 16, panel (A), recombination in 

DNA shuffling achieves accumulation of multiple beneficial mutations in a few cycles. In 

25 contrast, because of the high frequency of deleterious mutations relative to beneficial ones, 

iterative point mutation must build beneficial mutations one at a time, and consequently 

requires many cycles to reach the same point. As shown in Fig. 1 6 panel B, rather than a 

simple linear sequence of mutation accumulation, DNA shuffling is a parallel process where 

multiple problems may be solved independently, and then combined. 

30 1 . In Vitro Formats 

One format for shuffling in vitro is illustrated in Fig. 1 . The initial substrates 

for recombination are a pool of related sequences. The X's in Fig. 1, panel A, show where the 

sequences diverge. The sequences can be DNA or RNA and can be of various lengths 
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depending on the size of the gene or DNA fragment to be recombined or reassembled. 
Preferably the sequences are from 50 bp to 50 kb. 

The pool of related substrates are converted into overlapping fragments e g 
from about 5 bp to 5 kb or more, as shown in Kg. 1, p^e! B. ^ the size of the fragments 
: 5 ^fromaboutlObptolOOObp^son^ 

100 bp to 500 bp. The conversion can be effected by a number of different methods such as 
DNasel or RNase digestion, random shearing or partial restriction enzyme digestion ' 
^ tCmativeIy ' t ^ 

an^hficationofsubstratesorPCRprh^ Ahernatively/ appropriate 

10 smgle-stranded fragments can be generated on a nucleic acid synthesizer. The concentration 
of nuclei acid fragments of a particular length and sequence is often less than 0 1 % or I'/o by 
w«gl*oftbe^ 

mixture is usually at least about 100, 500 or 1000 

The mixed population of nucleic acid fragments are converted to at least 
15 partially single-stranded form. Conversion can be effected by heating to about 80 *C to 100 
«C, more preferably from 90 »C to 96 «C, to form single-stranded nucleic acid fragments and 

to reannealing. Conversion can also be effected by treatment with single-stranded DNA 

bmdmg protein or recA protein. Smgle-stranded nucleic acid fragments hkving regions of 

cooung to 4 C to 75 C, and preferably from 40 »C to 65 «C. Renaturation can be accelerated 
^ *J le Edition of polyethylene glycol (PEG), other volume-excluding reagents or salt The 
salt concentration is preferably from 0 mM to 200 mM, more preferably the salt concentration 
.s from 10 mM to 100 mM. The salt may beKCl or NaCl . The concentration of PEG is 
preferably from 0% to 20%, more preferably from 5% to 10%. The fragments that reanneal 

25 can be from different substrates as shown in Fig. 1, panel C. The annealed nucleic acid 

Segments are incubated in the presence of a nucleic acid polymerase, such as Taq or Klenow, 
or proofreading polymerases, such as pfu or pwo, and dNTP's (i.e. dATP, dCTP, dGTP and 
dTTP). If regions of sequence identity are large, Taq polymerase can be used with an 
annealing temperature of between 45-65°C. If the areas of identity are small, Klenow 

30 polymerase can be used with an annealing temperature of between 20-30°C (Stemmer Proc 
^/.^.5c/.^ ( , 9 94),^ 

acid fragments prior to annealing, simultaneously with annealmg or after annealing. 
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The process of denaturation, renaturation and incubation in the presence of 
polymerase of overlapping fragments to generate a collection of polynucleotides containing 
different permutations of fragments is sometimes referred to as shuffling of the nucleic acid in 
vitro. This cycle is repeated for a desired number of times. Preferably the cycle is repeated 
5 from 2 to 100 times, more preferably the sequence is repeated from 10 to 40 times. The 

resulting nucleic acids are a family of double-stranded polynucleotides of from about 50 bp to 
about 100 kb, preferably from 500 bp to 50 kb, as shown in Fig. 1, panel D. The population 
represents variants of the starting substrates showing substantial sequence identity thereto but 
also diverging at several positions. The population has many more members than the starting 

10 substrates. The population of fragments resulting from shuffling is used to transform host 
cells, optionally after cloning into a vector. 

In a variation of in vitro shuffling, subsequences of recombination substrates 
can be generated by amplifying the full-length sequences under conditions which produce a 
substantial fraction, typically at least 20 percent or more, of incompletely extended 

15 amplification products. The amplification products, including the incompletely extended 
amplification products are denatured and subjected to at least one additional cycle of 
reannealing and amplification. This variation, in which at least one cycle of reannealing and 
amplification provides a substantial fraction of incompletely extended products, is termed 
"stuttering." In the subsequent amplification round, the incompletely extended products 

20 reanneal to and prime extension on different sequence-related template species. 

In a further variation, a mixture of fragments is spiked with one or more 
oligonucleotides. The oligonucleotides can be designed to include precharacterized mutations 
of a wildtype sequence, or sites of natural variations between individuals or species. The 
oligonucleotides also include sufficient sequence or structural homology flanking such 

25 mutations or variations to allow annealing with the wildtype fragments. Some 

oligonucleotides may be random sequences. Annealing temperatures can be adjusted 
depending on the length of homology. 

In a further variation, recombination occurs in at least one cycle by template 
switching, such as when a DNA fragment derived from one template primes on the 

30 homologous position of a related but different template. Template switching can be induced 
by addition of recA, rad51, rad55, rad57 or other polymerases (e.g., viral polymerases, reverse 
transcriptase) to the amplification mixture. Template switching can also be increased by 
increasing the DNA template concentration. 
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BNSDOCID: <WO. 



O0O4190A1 JA> 



WO 00/04190 

* .. PCT/US99/I5972 

In a further variation, at least one cycle of amplification can be conducted usins 
acoUecaonof overlapping singie-stranded DNA fra^ 

lengths. Fragments can be prepared using a single stranded DNA phage, such as M13 Each 
fragment can hybridize to and prime polynucleotide ch^ 
5 the collection, thus fbrnung seauen<.-recombined ^ . 

.DNA fragments of variable length can be generated from a single primer by Vent or other 
PNApoly^on.iimDNAtenvlate. 'The single stranded DNA fragments are used as ' 

10 ™;rr v ltipIes ^^ 

10 Mol Biology 2% 572-577 (1995). 

2. In Vivo Formats 
(a). Plasmid-Plasmid Recombination 

The initial substrates for recombination are a coUection of polynucleotides 
comprising variant forms 6f a gene. The variant forms often show substantial sequence 
15 .dentnytp each other sufficient to allow homologous recombination between substrates The 
d.versitybe^^ 

(e.g.,em>r^^ 

resynthe S1 zmg genes encoding natural proteins with alternative and/or mixed codon usage 

^ ere ^ 0u ld be at to 

*0U^at^ 

substrates differing in at least two positions. However, commomy a Ubrary of substrata 
10 -10 members is employed. The degree of diversity depends on the length of the substrate 
bemg recombined and the extent of the functional change to be evolved. Diversity at between 
O.l-500/o of positions is typical. The diverse substrates are incorporated into plasmids The 
25 plasmas are often standard cloning vectors, e.g., bacterial multicopy plasmid However in 
some methods to be described below, the plasmids include mobilization functions The ' 
substrates can be incorporated into the same or different plasmids. Often at least two different 
types of plasmid having different types of selection marker are used to allow selection for cells 
contauung at least two types of vector. Also, where different types of plasmid are employed 
the Afferent plasmids can come from two distinct incompatibility groups to allow stable co- ' 
ex.stence of two different plasmids within the cell. Nevertheless, plasmid, from the same 
^compatibility group can still co-exist within the saVne cell for sufficient time to allow 
homologous recombination to occur. 
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Plasmids containing diverse substrates are initially introduced into prokaryotic 
or eukaryotic cells by any transfection methods (e.g., chemical transformation, natural 
competence, electroporation, viral transduction or biolistics). Often, the plasmids are present 
at or near saturating concentration (with respect to maximum transfection capacity) to 
5 increase the probability of more than one plasmid entering the same cell. The plasmids 

containing the various substrates can be transfected simultaneously or in multiple rounds. For 
example, in the latter approach cells can be transfected with a first aliquot of plasmid, 
transfectants selected and propagated, and then infected with a second aliquot of plasmid. 

Having introduced the plasmids into cells, recombination between substrates to 
10 generate recombinant genes occurs within cells containing multiple different plasmids merely 
by propagating in the cells. However, cells that receive only one plasmid are unable to 
participate in recombination and the potential contribution of substrates on such plasmids to 
evolution is not fully exploited (although these plasmids may contribute to some extent if they 
are propagated in mutator cells or otherwise accumulate point mutations (i.e., by ultraviolet 
1 5 radiation treatment). The rate of evolution can be increased by allowing all substrates to 
participate in recombination. Such can be achieved by subjecting transfected cells to 
electroporation. The conditions for electroporation are the same as those conventionally used 
for introducing exogenous DNA into cells (e.g., 1,000-2,500 volts, 400 nF and a 1-2 mM 

gap). Under these conditions, plasmids are exchanged between cells allowing all substrates to 
20 participate in recombination. In addition the products of recombination can undergo further 
rounds of recombination with each other or with the original substrate. The rate of evolution 
can also be increased by use of conjugative transfer. Conjugative transfer systems are known 
in many bacteria (E. coli, P. aeruginosa, S. pneumoniae, and H. influenzae) and can also be 
used to transfer DNA between bacteria and yeast or between bacteria and mammalian cells. 
25 To exploit conjugative transfer, substrates are cloned into plasmids having 

MOB genes, and tra genes are also provided in cis or in trans to the MOB genes. The effect 
of conjugative transfer is very similar to electroporation in that it allows plasmids to move 
between cells and allows recombination between any substrate and the products of previous 
recombination to occur merely by propagating the culture. The details of how conjugative 
30 transfer is exploited in these vectors are discussed in more detail below. The rate of evolution 
can also be increased by fusing protoplasts of cells to induce exchange of plasmids or 
chromosomes. Fusion can be induced by chemical agents, such as PEG, or viruses or viral 
proteins, such as influenza virus hemagglutinin, HSV-1 gB and gD. The rate of evolution can 
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also ; .be increased by use of mutator host cells (e.g., Mut L, S, D, T, H and ^(to/a 
telangiectasia human cell lines). 

Alternatively, plasmids can be propagated together to encourage recombination 
, then isolated, pooled, and reintroduced into cells. The combination of plasmids is different in 
each cell and recombination further increases the sequence diversity within the population 
This is optionally carried out recursively until the desired level of diversity is achieved The 
population is then screened and selected and this process optionally repeated with any selected 
cells/plasmids. 

The time for which cells are propagated and recombination is allowed to occur 
of course, varies with the cell type but is generally not critical, because even a small degree of ' 
recombination can substantially increase diversity relative to the starting materials. Cells 
bearing plasmids containing recombined genes are subject to screening or selection for a 
desired function. For example, if the substrate being evolved contains a drug resistance gene 
one selects for drug resistance. Cells surviving screening or selection can be subjected to one' 
or more rounds of screening/selection followed by recombination or can be subjected directly 
to an additional round of recombination. 

The next round of recombination can be achieved by several different formats 
independently of the previous round. For example, a further round of recombination can be 

of plasmids descnbed above. Alternatively, a fresh substrate or substrates, the same or 
different from previous substrates, can be transfected into cells surviving selection/screening. 
Optionally, the new substrates are included in plasmid vectors bearing a different selective 
marker and/or from a different incompatibility group than the original plasmids. As a further 
alternative, cells surviving selection/screening can be subdivided into two subpopuiatipns, and 
plasmid DNA from one subpopulation transfected into the other, where the substrates from 
the plasmids from the two subpopulations undergo a further round of recombmatiori. In either 
of the latter two options, the rate of evolution can be increased by employing DNA extraction, 
electroporation, conjugation or mutator cells, as described above. In a sttf further variation, 
DNA from cells surviving screening/selection can be extracted and subjected to in vitro DNA 
30 shuffling. . 

After the second round of recombination, a second round of screening/selection 
is performed, preferably under conditions of increased stringency. If desired, further rounds of 
recombination and selection/screening can be performed using the same strategy as for the 
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second round. With successive rounds of recombination and selection/screening, the surviving 
recombined substrates evolve toward acquisition of a desired phenotype. Typically, in this and 
other methods of recursive recombination, the final product of recombination that has acquired 
the desired phenotype differs from starting substrates at 0. l%-25% of positions and has 
5 evolved at a rate orders of magnitude in excess (e.g., by at least 10-fold, 100-fold, 1000-fold, 
or 10,000 fold) of the rate of naturally acquired mutation of about 1 mutation per 10" 9 
positions per generation (see Anderson & Hughes, Proc. Natl. AcacL Sci. USA 93, 906-907 
(1996)). As with other techniques herein, recombination steps can be performed recursively to 
enhance diversity prior to screening. In addition, the entire process can be performed in a 
10 recursive manner to generate desired organisms, clones or nucleic acids. 

3 . Virus-Plasmid Recombination 
The strategy used for plasmid-plasmid recombination can also be used for 

virus-plasmid recombination; usually, phage-plasmid recombination. However, some 

additional comments particular to the use of viruses are appropriate. The initial substrates for 

15 recombination are cloned into both plasmid and viral vectors. It is usually not critical which 
substrate(s) are inserted into the viral vector and which into the plasmid, although usually the 
viral vector should contain different substrate(s) from the plasmid. As before, the plasmid 
(and the virus) typically contains a selective marker. The plasmid and viral vectors can both be 
introduced into cells by transfection as described above. However, a more efficient procedure 

20 is to transform the cells with plasmid, select transformants and infect the transformarits with a 
virus. Because the efficiency of infection of many viruses approaches 100% of cells, v most 
cells transformed and infected by this route contain both a plasmid and virus bearing different 
substrates. 

Homologous recombination occurs between plasmid and virus generating both 
25 recombined plasmids and recombined virus. For some viruses, such as filamentous phage, in 
which intracellular DNA exists in both double-stranded and single-stranded forms, both can 
participate in recombination. Provided that the virus is not one that rapidly kills cells, 
recombination can be augmented by use of electroporation or conjugation to transfer plasmids 
between cells. Recombination can also be augmented for some types of virus by allowing the 
30 progeny virus from one cell to reinfect other cells. For some types of virus, virus infected- 
cells show resistance to superinfection. However, such resistance can be overcome by 
infecting at high multiplicity and/or using mutant strains of the virus in which resistance to 
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superinfection is reduced. Recursive infection and transformation prior to screening can be 
performed to enhance diversity. 

The result of infecting plasmid-containing cells with virus depends on the 
nature of the virus. Some viruses, such as filamentous phage, stably exist with a plasmid in the 
cell and also extrude progeny phage from the cell. Other viruses, such as lambda having a 
cosmid genome, stably exist in a cell like plasmids without producing progeny virions Other 
viruses, such as the T-phage and lytic lambda, undergo recombination with the plasmid but 
ultimately kill the host cell and destroy plasmid DNA. For viruses that infect cells without 
killing the host, cells containing recombinant plasmids and virus can be screened/selected using 
the same approach as for plasmid-plasmid recombination. Progeny virus extruded by cells 
surviving selection/screening can also be collected and used as substrates in subsequent rounds 
of recombination. For viruses that kUl their host cells, recombinant genes resulting from 
recombination reside only in the progeny virus, If the screening or selective assay requires 
expression of recombinant genes in a ceil, the recombinant genes should be transferred from 
the progeny virus to another vector, e.g., a plasmid vector, and retransfected into cells before 
selection/screening is performed. 

For filamentous phage, the products of recombination are present in both ceUs 

surviving recombination and in phage extruded from these cells. The dual source of 

in vitro recombination. Alternatively, the progeny , phage can be used to transfect or infect 
ceUs surviving a previous round of screening/selection, or fresh ceUs transfected "with fresh 
substrates for recombination. 

4. Virus- Virus Recombination : 
The principles described for plasmid-plasmid and plasmid-viral recombination 
can be applied to vinis-virus recombination with a few modifications. The initial substrates for 
recombination are cloned into a viral vector. Usually, the same vector is used for all 
substrates. Preferably, the virus is one that, naturally or as a result of mutation, does not kill 
cells. After insertion, some viral genomes can be packaged in vitro The packaged viruses are 
used to infect cells at high multiplicity such that there is a high probability that a cell receives 
multiple viruses bearing different substrates. 

After the initial round of infection, subsequent steps depend on the nature of 
infection as discussed in the previous section. For example, if the viruses have phagemid 
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genomes such as lambda cosmids or Ml 3, Fl or Fd phagemids, the phagemids behave as 
plasmids within the cell and undergo recombination simply by propagating in the cells. 
Recombination and sequence diversity can be enhanced by electroporation of cells. Following 
selection/screening, cosmids containing recombinant genes can be recovered from surviving 
5 cells (e.g., by heat induction of a cos" lysogenic host cell), repackaged in vitro, and used to 
infect fresh cells at high multiplicity for a further round of recombination. 

If the viruses are filamentous phage, recombination of replicating form DNA 
occurs by propagating the culture of infected cells. Selection/screening identifies colonies of 
cells containing viral vectors having recombinant genes with improved properties, together 
10 with phage extruded from such cells. Subsequent options are essentially the same as for 
plasmid-viral recombination. 

5 Chromosome-Plasmid Recombination 
This format can be used to evolve both the chromosomal and plasmid-borne 

substrates. The format is particularly useful in situations in which many chromosomal genes 

15 contribute to a phenotype or one does not know the exact location of the chromosomal 

gene(s) to be evolved. The initial substrates for recombination are cloned into a plasmid 

vector. If the chromosomal gene(s) to be evolved are known, the substrates constitute a 

family of sequences showing a high degree of sequence identity but some divergence from the 

chromosomal gene. If the chromosomal genes to be evolved have not been located, the initial 

20 substrates usually constitute a library of DNA segments of which only a small number show 

sequence identity to the gene or gene(s) to be evolved. Divergence between plasmid-borne 

substrate and the chromosomal gene(s) can be induced by mutagenesis or by obtaining the 

plasmid-borne substrates from a different species than that of the cells bearing the 

chromosome. 

25 The plasmids bearing substrates for recombination are transfected into cells 

having chromosomal gene(s) to be evolved. Evolution can occur simply by propagating the 
culture, and can be accelerated by transferring plasmids between cells by conjugation, 
electroporation or protoplast fusion. Evolution can be further accelerated by use of mutator 
host cells or by seeding a culture of nonmutator host cells being evolved with mutator host 

30 cells and inducing intercellular transfer of plasmids by electroporation, conjugation or 
protoplast fusion. Alternatively, recursive isolation and transformation can be used. 
Preferably, mutator host cells used for seeding contain a negative selection marker to facilitate 
isolation of a pure culture of the nonmutator cells being evolved. Selection/screening 
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identifies cells bearing chromosomes and/or plasmids that have evolved toward acquisition of 
a desired function. 

Subsequent rounds of recombination and selection/screening proceed in sinular 
fashion to those described for pbstrid-plasniid recombinatioi. For example, further 
5 recombination can be effe<*ed'by propagating ceife suiviving rewimbiqatioii ^combination 
with electroporation, conjugate transfer of plasmids, or protoplast fusion. Alternatively 
plasmids bearing additional substrates fo^^ 

cells. Preferably, such plasmids are from a difierent incompatibUity group and b 
selective marker than the original plasmids to allow selection for cells containing at least two 
10 different plasmids. As a further alternative, plasmid and/or chromosomal DNA can be isolated 
from a subpopulation of surviving cells and transfected into a second subpopulation. 
Chromosomal DNA can be cloned into a plasmid vector before transfection. 

6. Virus-Chromoso me Recombination 
As in the other methods described above, the virus is usually one that does not 
15 • kill ^ cells, and is often a phage or phagemid. The procedure is substantially the same as for 
plasmid-chromosome recombination. Substrates for recombination are cloned into the vector 
•Vectors including the substrates can then be'transfected into cells or in vitro packaged and 
introduced into cells by infection. Viral genomes recombine with host chromosomes merely 

„. by P rOPag 5 n ^ a C UltUre : Evolution 03X1 beaccelerat 5 d ^ allowing interceUular transfer of 

PMpyii^^ 

Screening/selection identifies cells having chromosomes and/or viral genomes that have 
evolved toward acquisition of a desired function. 

There are several options for subsequent rounds of recombination. For 
example, viral genomes can be transferred between cells surviving selection/recombination by 
recursive isolation and transfection and electroporation. Alternatively, viruses extruded from 
cells surviving selection/screening can be pooled and used to superinfect the cells at high 
multiplicity. Alternatively, fresh substrates for recombination can be introduced into the cells, 
either on plasmid or viral vectors. 
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CC. POOLWISE WHOT . Tj GENOMF. R ECOMBrNJ A TTOM 
Asexual evolution is a slow and inefficient process. Populations move as 
individuals rather than as a group. A diverse population is generated by mutagenesis of a 
single parent, resulting in a distribution of fit and unfit individuals. In the absence of a sexual 
cycle, each piece of genetic information for the surviving population remains in the individual 
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mutants. Selection of the fittest results in many fit individuals being discarded, along with the 
genetically useful information they carry. Asexual evolution proceeds one genetic event at a 
time, and is thus limited by the intrinsic value of a single genetic event. Sexual evolution 
moves more quickly and efficiently. Mating within a population consolidates genetic 
5 information within the population and results in useful information being combined together. 
The combining of useful genetic information results in progeny that are much more fit than 
their parents. Sexual evolution thus proceeds much faster by multiple genetic events. These 
differences are further illustrated in Fig. 17. In contrast to sexual evolution, DNA shuffling is 
the recursive mutagenesis, recombination, and selection of DNA sequences {see also, Fig, 
10 25.). 

Sexual recombination in nature effects pairwise recombination and results in 
progeny that are genetic hybrids of two parents. In contrast, DNA shuffling in vitro effects 
poolwise recombination, in which progeny are hybrids of multiple parental molecules. This is 
because DNA shuffling effects many individual pairwise recombination events with each 
1 5 thermal cycle. After many cycles the result is a repetitively inbred population, with the 

'^progeny" being the Fx ( for X cycles of reassembly) of the original parental molecules. These 
progeny are potentially descendants of many or all of the original parents. The graph shown in 
Fig. 25 shows a plot of the potential number of mutations an individual can accumulate by 
sequential, pairwise and poolwise recombination. 

s 

20 Poolwise recombination is an important feature to DNA shuffling in that it 

provides a means of generating a greater proportion of the possible combinations of mutations 
from a single breeding" experiment. In this way, the "genetic potential" of a population can 
be readily assessed by screening the progeny of a single DNA shuffling experiment. 

For example, if a population consists of 10 single mutant parents, there are 2 10 

25 = 1024 possible combinations of those mutations ranging from progeny having 0-1 0 ; 

mutations. Of these 1024, only 56 will result from a single pairwise cross (Fig. 14) (i.e those 
having 0, 1, and 2 mutations). In nature the multiparent combinations will eventually arise 
after multiple random sexual matings, assuming no selection is imparted to remove some 
mutations from the population. In this way, sex effects the consolidation and sampling of all 

30 useful mutant combinations possible within a population! For the purposes of directed 

evolution, having the greatest number of mutant combinations entering a screen or selection is 
desirable so that the best progeny (i.e., according to the selection criteria used in the selection 
screen) is identified in the shortest possible time. 
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One challenge to in vivo and whole genome shuffling is devising methods for 
effecting poolwise recombination or multiple repetitive pairwise recombination events. In 
crosses with a single painvise cross per cycle before screening, the ability to screen the 
"genetic potential" of the startmg population is Uimted . For tWs reason 
5 whole genome shuffling mediated cellular evolution would be 

recombination. Two strategies for poolwise recombination are described below (protoplast 
fusion and transduction). ' ' ' " 

1. Protoplast Fusion 
Protoplast fusion (discussed suprd) mediated whole genome shuffling (WGS) is 
10 one format that can directly effect poolwise recombination. Whole gene shuffling is the 
recursive recombination of whole genomes, in the form ofone or more nucleic acid 
molecule(s) (fragmentsV chromosomes, episomes, etc), from a population of o^ 
'resuhmg in the production p'f new organisms having distributed genetic inclination from at 
least two of the starting population of organisms. The process of protoplast fusion is further 
15 illustrated in Fig. 26. 

Progeny resulting from the fusion of multiple parent protoplasts have been 
observed (Hopwood & Wright, 1978), however, these progeny are rare (l O^-lO-*). The low 
frequency is attributed to the distribution of fusants arising from two, three, fou r ; etc parents 
and I the likelihood of the multiple recombination events (6 crossovers for a four parent cross)- 
.J^tiiat^ia ^^ 

multiparent progeny. This can be accomplished, e.g., by repetitive fosion or enrichment for 
multiply fused protoplasts. The process of poolwise fusion and recombination is further 
illustrated in Fig. 27. 
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2. Repetitive Fusion- . 
Protoplasts of identified parental cells are prepared, fused and regenerated. 
Protoplasts of the regenerated progeny are then, without screening or enrichment, formed, 
fused and regenerated. This can be carried out for two, three, or more cycles before screening 
to increase the representation of multiparent progeny: The number of possible 
mutations/progeny doubles for each cycle. For example, if one cross produces predominantly 
progeny with 0, 1 , and 2 mutations, a breeding of this population with itself will produce 
progeny with 0, 1, 2, 3, and 4 mutations (Fig. 15), the third cross up to eight, etc. The 
representation of the multiparent progeny from these subsequent crosses will not be as high as 
the single and double parent progeny, but it will be detectable and much higher than from a 
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single cross. The repetitive fusion prior to screening is analogous to many sexual crosses 
within a population, and the individual thermal cycles of in vitro DNA shuffling described 
supra. A factor effecting the value of this approach is the starting size of the parental 
population. As the population grows, it becomes more likely that a multiparent fusion will 
5 arise from repetitive fusions. For example, if 4 parents are fused twice, the 4 parent progeny 
will make up approximately 0.2% of the total progeny. This is sufficient to find in a 
population of 3000 (95% confidence), but better representation is preferable. If ten parents 
are fused twice >20% of the progeny will be four parent offspring. 

3 . Enrichment for multiply fused protoplasts: 

10 After the fusion of a population of protoplasts, the fusants are typically diluted 

into hypotonic medium, to dilute out the fusing agent (e.g., 50% PEG). The fused cells can be 
grown for a short period to regenerate cell walls or separated directly and are then separated 
on the basis of size. This is carried out, e.g., by cell sorting, using light dispersion as an 
estimate of size, to isolate the largest fusants. Alternatively the fusants can be sorted by FACS 
15 on the basis of DNA content. The large fusants or those containing more DNA result from the 
fusion of multiple parents and are more likely to segregate to multiparent progeny. The 
enriched fusants are regenerated and screened directly or the progeny are fused recursively as 
above to further enrich the population for diverse mutant combinations. 

4. Transduction: 

20 Transduction can theoretically effect poolwise recombination, if the transducing 

phage particles contain predominantly host genomic DNA rather than phage DNA. If phage 
DNA is overly represented, then most cells will receive at least one undesired phage genome. 
Phage particles generated from locked-in-prophage {supra) are useful for this purpose. A 
population of cells is infected with an appropriate transducing phage, and the lysate is 

25 collected and used to infect the same starting population. A high multiplicity of infection is 
employed to deliver multiple genomic fragments to each infected cell, thereby increasing the 
chance of producing recombinants containing mutations from more than two parent genomes. 
The resulting transductants are recovered under conditions where phage can not propagate 
e.g., in the presence of citrate. This population is then screened directly or infected again with 

30 phage, with the resulting transducing particles being used to transduce the first progeny. This 
would mimic recursive protoplast fusion, multiple sexual recombination, and in vitro DNA 
shuffling. 
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DP, METHODS FOR WHOI EGKNOMF SHUFFLING RV m tktp 
FAMILY SHUFFLING OF PAR SED GF.NOMFS AND RFrire^^ 

CYCLES OF FORCED IN TEGRATION AND EXCTS r(YMRv ~ 

HOMOLOGOUS RECOMRT NATION AND SCRFENIN^^p 
5 IMPROVED PHENOTVPFS " — - 

In vitro methods have been developed to shuffle single genes and operons, as 
set forth, e.g., herein. "Family" shuffling of homologous genes within species and from ' 
different species is also an effective methods for accelerating molecular evolution. This 
section describes additional methods for extending these methods such that they can be 
10 applied to whole genomes. ' - ' 

In some cases, the genes that encode rate limiting steps in a biochemical 
process, or that contribute to a phenotype of interest are known. This method can be used to 
target family shuffled libraries to such loci, generating hbraries of organisms with high quality 
family shuffled Ubraries of alleles at the locus of interest. An example of such a gene would be 
the evolution of a host chaperonin to more efficiently chaperone the folding of an 
oyerexpressed protem in E. coli. 

The goals of this process are to shuffle homologous genes from two or more 
species and to then integrate the shuffled genes into the chromosome of a target organism. 
Integration of multiple shuffled genes at multiple loci can be achieved using.reeursive cycles of 
integration (generating duplications), excision (leaving the improved allele in the chromosome) 

. , . a :, . ■ - ^ :J gnd transfer, of additional evolvedgenes by_seriall vannlving th<> - * 

In the first step, genes to be shuffled into suitable bacterial vectors are 
subcloned. These vectors can be plasmids, cosimds, BACS or the h^e. Thus, fragments from 
100 bp to 100 kb can be handled. Homologous fragments are then "family shuffled" together 
(i.e. homologous fragments from different species or chromosomal locations are 
homologously recdmbined). As a simple case, homologs from two species (say, E. coli and 
Salmonella) are cloned, family shuffled in vitro and cloned into an allele replacement vector , 
(e.g., a vector with a positively selectable marker, a negatively selectable marker and 
conditionally active origin of replication). The basic strategy for whole genome family 
shuffling of parsed (subcloned) genomes is additionally set forth in Fig. 22. 

The vectors are transfected into E. coli and selected, e.g., for drug resistance. 
Most drug resistant cells should arise by homologous recombination between a family shuffled 
insert and a chromosomal copy of the cloned insert. Colonies with improved phenotype are 
screened (e.g., by mass spectroscopy for enzyme activity or small molecule production, or a 
chromogenic screen, or the like, depending on the phenotype to be assayed). Negative 
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selection (i.e. sue selection) is imposed to force excision of tandem duplication. Roughly half 
of the colonies should retain the improved phenotype. Importantly, this process regenerates a 
'clean 1 chromosome in which the wild type locus is replaced with a family shuffled fragment 
that encodes a beneficial allele. Since the chromosome is "clean" (i.e., has no vector 
5 sequences), other improved alleles can also be moved into this point on the chromosome by 
homologous recombination. 

Selection or screening for improved phenotype can occur either after step 3 or 
step 4 in Figure 22. If selection or screening takes place after step 3, then the improved allele 
can be conveniently moved to other strains by, for example, PI transduction. One can then 

10 regenerate a strain containing the improved allele but lacking vector sequences by "negative 
selection" against the sue marker. In subsequent rounds, independently identified improved 
variants of the gene can be sequentially moved into the improved strain (e.g., by PI 
transduction of the drug marked tandem duplication above). Transductants are screened for 
further improvement in phenotype by virtue of receiving the transduced tandem duplication, 

15 which itself contains the family shuffled genetic material. Negative selection is again imposed 
and the process of shuffling the improved strain is recursively repeated as desired. 

Although this process was described with reference to targeting a gene or 
genes of interest, it can be used blindly," making no assumptions about which locus isVto be 
targeted. This procedure is set forth in Fig. 23. For example, the whole genome of an 

20 organism of interest is cloned into manageable fragments (e.g., 10 kb for plasmid-based 

methods). Homologous fragments are then isolated from related species by the method shown 
in Figure 23. Forced recombination with chromosomal homologs creates chimeras (Fig. 22). 

EE: METHODS FOR HIGH THROUGHPUT FAMILY SHUFFLING OF 
GENES 

25 For E. coli ^ cloning the genome in 10 kb fragments requires about 300 clones. 

The homologous fragments are isolated, e.g., from Salmonella, This gives roughly three 
hundred pairs of homologous fragments. Each pair is family shuffled and the shuffled 
fragments are cloned into an allele replacement vector. The inserts are integrated into the K 
coli genome as described above. A global screen is made to identify variants with an 

30 improved phenotype. This serves as the basis collection of improvements that are to be 

shuffled to produce a desired strain. The shuffling of these independently identified variants 
into one super strain is done as described above. 

Family shuffling has been shown to be an efficient method for creating high 
quality libraries of genetic variants. Given a cloned gene from one species, it is of interest to 
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qiuckly and rapidly isolate homologs from other species, and this prbcess can be rate linriting 
For example, if one wants to perform family shuffling on an entire genome, one may need to 
construct hundreds to thousands of individual family shuffled libraries. 

In this emoodiment, a gene of interest is optionally cloned into a vector in 
which ssDNA can be made. An example of such a vector is a ^ phagemid ve<^r with anM13 
ongm of replication: Genomic DNA or cDNA from a 1 species of interest is isolated 
denatured, annealed to the phagemid, and then*nzymatic^ The 
cloned DNA is then used to family shuffle with the original gene of interest." PGR based 
formats are also available as outlined in Figure 24 These formats require no intermediate 
cloning steps, and are, therefore, of particular interest for high throughput applications. 

Alternatively, the gene of interest can be fished out using purified RecA 
protein. The gene of interest is PCR amplified using primers that are tagged with an affinity 
tag such as biotin, denatured, then coated with RecA protein (or an improved variant thereof) 
The coated ssDNA is then mixed with a gDNA plasmid library; Under the appropriate 
conditions, such as in the presence of non-hydrolyzable rATP analogs, RecA will catalyze the 
hybridization of the RecA coated gene (ssDNA) in the plasmid library. The heteroduplex is 
then affinity purified from the non-hybridizing plasmids of the gene library by adsorbtion of the 
labeled PCR products and its associated homologous DNA to an appropriate affinity matrix 

-? e h0m0l0g0USDNAiS US6d in afami 'y shufflin H reaction for im provement of the desired - . 
Ifi3n1|ti3n^^ 

Shufflin g the E. coli chaperonin gene DnaJ with other homologs is described 
below as an example. The example can be generalized to any other gene, including eukaryotic 
genes such as plant or animal genes (including mammalian genes), by following the format 
described. Fig. 24 provides a schematic outline of the steps to high throughput family 
25 shuffling. 

As a first step, the E. coli DnaJ gene is cloned into an Ml 3 phagemid vector. 
ssDNA is then produced, preferably in a dut(-) ung(-) strain so that Kunkel site directed 
mutagenesis protocols can be applied. Genomic DNA is then isolated from a rion- £1 coil 
source, such as Salmonella and Yersinia Pestis The bacterial genomic DNAs are denatured 
30 and reannealed to the phagemid ssDNA (e.g., about 1 microgram of ssDNA). The reannealed 
product is treated with an enzyme such as Mung Bean nuclease that degrades ssDNA as an 
exonuclease but not as an endonuclease (the nuclease does not degrade mismatched DNA that 
is embedded in a larger annealed fragment). The standard Kunkel site directed mutagenesis 
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protocol is used to extend the fragment and the target cells are transformed with the resulting 
mutagenized DNA, 

In a first variation on the above, the procedure is adapted to the situation where 
the target gene or genes of interest are unknown. In this variation, the whole genome of the 
5 organism of interest is cloned in fragments (e.g., of about 10 kb each) into a phagemid. Single 
stranded phagemid DNA is then produced. Genomic DNA from the related species is 
denatured and annealed to the phagemids. Mung bean nuclease is used to trim away 
unhybridized DNA ends. Polymerase plus ligase is used to fill in the resulting gapped circles. 
These clones are transformed into a mismatch repair deficient strain. When the mismatched 

10 molecules are replicated in the bacteria, most colonies contain both the E. coli and the 

homologous fragment. The two homologous genes are then isolated from the colonies (e.g., 
either by standard plasmid purification or colony PCR) and shuffled. . 

Another approach to generating chimeras that requires no in vitro shuffling is 
simply to clone the Salmonella genome into an allele replacement vector, transform E. coli, 

15 and select for chromosomal integrants. Homologous recombination between Salmonella 

genes and E. coli homologs generate shuffled chimeras. A global screen is done to screen for 
improved phenotypes. Alternately, recursive transformation and recombination is performed 
to increase diversity prior to screening. If colonies with improved phenotypes are obtained, it 
is verified that the improvement is due to allele replacement by PI transduction into a fresh 

20 strain and counterscreening for improved phenotype. A collection of such improved alleles can 
then be combined into one strain using the methods for whole genome shuffling by blind family 
shuffling of parsed genomes as set forth herein. Additionally, once these loci are identified, it 
is likely that further rounds of shuffling and screening will yield further improvements. This 
could be done by cloning the chimeric gene and then using the methods described in this 

25 disclosure to breed the gene with homologs from many different strains of bacteria. 

In general, the transformants contain clones of the homologue of the target 
gene (e.g., E. coli DnaJ in the example above). Mismatch repair in vivo results in a decrease 
in diversity of the gene. There are at least two solutions to this. First, transduction can be 
performed into a mismatch repair deficient strain. Alternatively or in addition, the Ml 3 

30 template DNA can be selectively degraded, leaving the cloned homologue. This can be done 
using methods similar to the standard Eckstein, site directed mutagenesis technique (General 
texts which describe general molecular biological techniques useful herein, including 
mutagenesis, include Sambrook et al., Molecular Cloning - A Laboratory Manual (2nd Ed.), 
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Vol. 1,3, Cold Spring Harbor laboratory, Cold Spring Harbor, New York, 1989 
("Sambrook") and Current Protocols in Molecular Btoloey , FM Ausubel et al., eds Current 
Protocols, a joint venture between Greene Pubhshirig Associates, Inc. and John Wiley & Sons 
Inc., (supplemented through 1998) ("Ausubel")). 

This method reUes on incorporation of alpha thiol modified dNTP's during 
synthesis of the new strand followed by selective degradation of the template and resynthesis 
of the template strand In one embodinient, the template strand is grown in a dut(-) ung(-) 
strain so that uracil is incorporated into the phagemid DNA After extension as noted above 
(and before transformation) the DNA is treated With uracil glycosylate and an apurinic site 
endonuclease such as Endo JJI or Endo IV . The treated DNA is then treated with a 
processive exonuclease that resects from the resulting gaps while leaving the other strand 
intact (as in Eckstein mutagenesis). The DNA is polymerized and ligated. Target cells are 
then transformed. This process enriches for clones encoding the homoldgue which is not 
derived from the target (i.e., in the example above, the non^. co//. homolbgue). 

An analogous procedure is optionally performed in a PCR format. As applied 
to the DnaJ illustration above, DnaJ DNA is amplified by PCR with primers that build 30-mer 
priming sites on each end: The PCR is denatured and annealed with an excess of Salmonella 
genomic DNA. The Salmonella DnaJ gene hybribidizes with the E. coli homologue. After 
^^"t with Mung Bean nuclease, theresulting mismatch^ hybrid is PCR am plified with . - 

e.g., Fig. 24. 

As genomics pnividesan-increasiiig Amount of sequence information, his • 
increasingly possible to directly PCR amplify homologs with designed primers. For example 
given the sequence of the E. co//' genome and of a related genome (i.e. Salmonella), each 
genome can be PCR amplified with designed primers i^ e.g., 5 kb fragments. The : 
homologous fragments can be put together ina ^ pairwise fashion for shuffling. For genome 
shuffling, the shuffled products are cloned into the allele replacement vector and bred into the 
genome as described supra, . : . — 
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FF. Hn>ER-RECOMBrM QGENTr: RFPA CI ONFS 

The invention further provides hyper-recombmogemc RecA proteins (^e the 
examples below)- Examples of such proteins are from clones 2, 4, 5, 6 and 13 shown in Fig 
13. It is folly expected that one ofskfll can make a variety of related recombinogenic proteins 
given the disclosed sequences. , j -•• , . 
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Clones comprising the sequences in Figs. 12 and 13 are optionally used as the 
starting point for any of the shuffling methods herein, providing a starting point for mutation 
and recombination to improve the clones which are shown. 

Standard molecular biological techniques can be used to make nucleic acids 
5 which comprise the given nucleic acids, e.g., by cloning the nucleic acids into any known 
vector. Examples of appropriate cloning and sequencing techniques, and instructions 
sufficient to direct persons of skill through many cloning exercises are found in Berger and 
Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology volume 1 52 
Academic Press, Inc., San Diego, CA (Berger); Sambrook et aL (1989) Molecular Cloning - 

10 A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring 

Harbor Press, NY, (Sambrook); and Current Protocols in Molecular Biology, RM. Ausubel 
et aL 9 eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and 
John Wiley & Sons, Inc., (1994 Supplement) (Ausubel). Product information from 
manufacturers of biological reagents and experimental equipment also provide information 

15 useful in known biological methods. Such manufacturers include the SIGMA chemical 
company (Saint Louis, MO), R&D systems (Minneapolis, MN), Pharmacia LKB 
Biotechnology (Piscataway, NJ), CLONTECH Laboratories, Inc. (Palo Alto, CA), Chiem 
Genes Corp., Aldrich Chemical Company (Milwaukee, WI), Glen Research, Inc., GD3CO 
BRL Life Technologies, Inc. (Gaithersberg, MD), Fluka Chemica-Biochemika Analytika 

20 (Fluka Chemie AG, Buchs, Switzeriand), Invitrogen, San Diego, CA, and Applied Biosystems 
(Foster City, CA), as well as many other commercial sources known to one of skill. 

It will be appreciated that conservative substitutions of the given sequences can 
be used to produce nucleic acids which encode hyperrecombinogenic clones. ''Conservatively 
modified variations" of a particular nucleic acid sequence refers to those nucleic acids which 

25 encode identical or essentially identical amino acid sequences, or where the nucleic acid does 
not encode an amino acid sequence, to essentially identical sequences. Because of the 
degeneracy of the genetic code, a large number of functionally identical nucleic acids encode 
any given polypeptide. For instance, the codons CGU, CGC, CGA, CGG, AGA, and AGG all 
encode the amino acid arginine. Thus, at every position where an arginine is specified by a 

30 codon, the codon can be altered to any of the corresponding codons described without altering 
the encoded polypeptide. Such nucleic acid variations are "silent variations," which are one 
species of "conservatively modified variations." Every nucleic acid sequence herein which 
encodes a polypeptide also describes every possible silent variation. One of skill will 
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recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon 
for methionine) can be modified to yield a functionally identical molecule by standard 
techniques. Accordingly, each "silent variation" of a nucleic acid which encodes a polypeptide 
is implicit in any described sequence. Furthermore, one of skill will recognize that individual 
5; Sub ^ mtio ^ delete 

percentage of amino acids (typically less than 5%, more typically less than P^) „ an encoded 
sequence are "conservatively modified variations" where the alterations result in the 
substitution of an amino acid with a chemically simUar amino acid. Conservative substitution 
tables providing functionally similar amino acids are weU known in the art.; The following six 
10 - groups each contain amino acids that are conservative substitutions for one another 1) 

Alanine (A), Serine (S), Threonine (T); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine 
(N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine 
(M), Valine (V); and 6) Phenylalanine (F),- Tyrosine (Y), Tryptophan (W). See also, 
Crcighton (1984) Proteins W.H. Freeman and Company. FinaUy, the . addition of sequences 
15 which do not alter the encoded activity of a nucleic acid molecule, such as a non-functional 
sequence is a conservative modification of the basic nucleic acid. 

One of skill will appreciate that many conservative variations of the nucleic acid 
constructs disclosed yield a functionally identical construct. For example, due to the 
degeneracy of the genetic code, "silent substitutions" {i.e., substitutions of a nucleic acid 



25 



30 



* of every nucleic acid sequence which encodes an amino acid. Similarly, "conservative amino 
acid substitutions,'? in one or a few amino acids in an amino acid sequence of a packaging or 
packageable construct are substituted with different amino acids with highly similar properties, 
are also readily identified as being highly similar to a disclosed construct: Such conservatively 
substituted variations of each explicitly disclosed sequence are a feature of the present 
invention. . . • . 

Nucleic acids ^ which hybridize under stringent conditions to the nucleic acids in 
the figures are a feature of the invention. "Stringent hybridization wash. conditions" in the 
context of nucleic acid hybridization experiments such as Southern and northern hybridizations 
are sequence dependent, and are different under different environmental parameters. An 
extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory 
Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic Acid Probes 
part I chapter 2 "overview of principles of hybridization and the strategy of nucleic acid probe 
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assays", Elsevier, New York. Generally, highly stringent hybridization and wash conditions 
are selected to be about 5° C lower than the thermal melting point (Tm) for the specific 
sequence at a defined ionic strength and ph. The Tm is the temperature (under defined ionic 
strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. 
5 Very stringent conditions are selected to be equal to the T m for a particular probe. In general, 
a signal to noise ratio of 2x (or higher) than that observed for an unrelated probe in the 
particular hybridization assay indicates detection of a specific hybridization. 

Nucleic acids which do not hybridize to each other under stringent conditions 
are still substantially identical if the polypeptides which they encode are substantially identical. 
10 This occurs, e.g. , when a copy of a nucleic acid is created using the maximum codon 
degeneracy permitted by the genetic code. 

Finally, preferred nucleic acids encode hyper-recombinogenic RecA proteins 
which are at least one order of magnitude (10 times) as active as a wild-type RecA protein in a 
standard assay for Rec A activity. 

15 GG recE / recT MEDIATED SHUFFLING IN VIVO ' 

Like recA, recE and recT (or their homologues, for example the lambda 

recombination proteins reda and redp) can stimulate homologous recombination in vivo. See, 
Muyrers et al. (1999) Nucleic Acids Res 27(6):1555-7 and Zhang et al. (1998) Nat Genet 
20(2):123-8 

20 Hyper-recombinogenic recE and recT are evolved by the same method as 

described for recA. Alternatively, variants with increased recombinogenicity are selected by 
their ability to cause recombination between a suicide vector (lacking an origin of replication) 
carrying a selectable marker, and a homologous region in either the chromosome or a stably- 
maintained episome. . . . . , 

25 A plasmid containing recA and recE genes is shuffled (either using these genes 

as single starting points, or by family shuffling (with for example reda and redp, or other 
homologous genes identified from available sequence databases). This shuffled library is then 
cloned into a vector with a selectable marker and transformed into an appropriate 
recombination-deficient strain. The library of cells would then be transformed with a second 

30 selectable marker, either borne on a suicide vector or as a linear DNA fragment with regions at 
its ends that are homologous to a target sequence (either in the plasmid or in the host 
chromosome). Integration of this marker by homologous recombination is a selectable event, 
dependent on the activity of the recE and recT gene products. The recE / recT genes are 
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.solated from cells in which homologous recombination has occurred; The process is repeated 
several times to enrich for the most efficient variants before the next round of shuffling is 
performed In addition, cycles of recbmbmation without selection can be 
increase the diversity of a cell population prior to selection. 

described for hyper-recombmogedc recA For example they 

condmonally) in a host cell to facilitate homologous recombination between variant gene 
fragments and homologues within the host cell. They are alternatively introduced by 

10 variiant genes. 

Hyper-recombinogenic irecE/ recT (either of bacterid / phage origm, or from 
plant homologues) are useful for facilitating homologous recombination in plants They are 
for example, cloned into the Agrobacterium cloning vector, where they are expressed upon ' 
entry mto the plant, thereby stimulating homologous recombination in the recipient cell 

In a preferred embodiment, recE/ recT are used and or generated in mutS 

strains. 

HH MULTI-CYCLIC RECO MBINATION 

As noted, protoplast fusion is an efficient means of recombining two microbial 
genomes. The process reproducibly results in about 10«W a hbn-selected population being 
feF|e¥ni&mah^^ 

Protoplasts are cells that have been stripped of their cell walls by treatment in 
hypotonic medium with cell wall degrading enzymes. Protoplast fusion is the induced fusion 
of the membranes of two or more of these protoplasts by fusogenic agents such as 
polyethylene glycol. Fusion results in cytoplasmic mixing and places the genomes of the fused 
cells within the same membrane. Under these conditions recombination between the genomes 
is frequent. 

The fused protoplasts are regenerated, and, during cell division, single genomes 
segregate into each daughter cell. Typically, 10% of these daughter cells have genomes that 
originate partially from more than one of the original parental protoplast genomes. 

this result is' similar to that 'of the crossing over of sister chromatids in : 
eukaryotic cells during prophase of meiosis H. The percentage of daughter cells that are 
recombinant is just lower after protoplast fusion. While protoplast fusion does result in 
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efficient recombination, the recombination predominantly occurs between two cells as in 
sexual recombination. 

In order to efficiently generate libraries of whole genome shuffled libraries, 
daughter cells having genetic information originating from multiple parents are made. 
5 Jn vitro DNA shuffling results in the efficient poolwise recombination of 

multiple homologous DNA sequences. The reassembly of full length genes from a mixed pool 
of small gene fragments requires multiple annealing and elongation cycles, the thermal cycles 
of the primerless PCR reaction. During each thermal cycle, many pairs of fragments anneal 
and are extended to form a combinatorial population of larger chimeric DNA fragments. After 

10 the first cycle of reassembly, chimeric fragments contain sequences originating from two 

different parent genes. This is similar to the result of a single sexual cycle within a population, 
pairwise cross, or protoplast fusion. During the second cycle, these chimeric fragments can 
anneal with each other, or with other small fragments, resulting in chimeras originating from 
up to four different parental sequences. 

15 This second cycle is analogous to the entire progeny from a single sexual cross 

inbreeding with itself. Further cycles will result in chimeras originating from 8, 16, 32, etc 
parental sequences and are analogous to further inbreedings of the progeny population. The 
power of in vitro DNA shuffling is that a large combinatorial library can be generated from a 
single pool of DNA fragments reassembled by these recursive pairwise "matings " As 

20 described above, in vivo shuffling strategies, such as protoplast fusion, result in a single 
pairwise mating reaction. Thus, to generate the level of diversity obtained by in vitro 
methods, in vivo methods are carried out recursively. That is, a pool of organisms is 
recombined and the progeny pooled, without selection, and then recombined again. This 
process is repeated for sufficient cycles to result in progeny having multiple parental 

25 sequences. 

Described below is a method used to shuffle four strains of Streptomyces 
coelicolor. From the initial four strains each containing a unique nutritional marker, three to 
four rounds of recursive pooled protoplast fusion was sufficient to generate a population of 
shuffled organisms containing all 16 possible combinations of the four markers. This 
30 represents a 1 0 6 fold improvement in the generation of four parent progeny as compared to a 
single pooled fusion of the four strains. ....... 

As set forth in Figure 31, protoplasts were generated from several strains of S. 
coelicolor, pooled and fused. Mycelia were regenerated and allowed to sporulate. The spores 
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" we» collectod, dk»Wed to sioW i^-Mybeli^--fi»0ed mt^ 'p^topb^ "^riiod «d' fe^- 
the process repeated for three to four rounds, the resulting spores were then subject to 

screening. 

The baste pnm)coI fofgener^ a- Whole genome shufflei"ia)raiy torn -four £ 
5 coelicolor strains, each having One of four distinct markers, was as follows. Four mycelial 
cultures, each of a strain having one of four ^^r^s,^^^^,^^ 
phase. The mycelia from each wereiiarvested by ^^ntri&gation and; washed: Protoplasts from 
each culture were prepared as follows. 

Approximately 10 9 .£ coelicolor spores were inoculated into 50ml YEME with 
10 .0.5% Glycine in a 250ml baffled flask. The spores were incubated at 30°C for 36-40 hours in 
an orb,tal shaker. Mycelium were verified using a microscope. Some strains needed an 
additional day of growth, The culture was transferred into a 50ml tube and centrifuged at 
4,000 rpm for 10 min. The mycelium were twice washed with 1 0.3% sucrose and centrifuged 
at 4,000 rpm for 10 min. (mycelium can be stored at -SOX after wash). 5ml of lysozyme was 
15 added to the ~0,5g of mycelium pellet. The pellet was suspended and incubated at 30°C for 
20-60 min., with gentle shaking every 10 min. The microscope was checked for protoplasting 
every 20 mm: Once the majority were protoplasts, protoplasting was stopped by adding 1 0ml 
of P buffer. The protoplasts were filtered through cotton and the protoplast spun down at 
i ,0Q0 ^ mf ^ 7minat . rOOm ^P^ture. The supernatantwasdiscarded and the protop last 

about 500ul) : Ten-fold serial dilutions were made in P buffer, and the protoplasts counted at a 
10" 2 dilution. Protoplasts were adjusted to 10 10 protoplasts per ml. 

The protoplasts from each culture were quantitated by microscopy. 1 0 8 
protoplast from each culture were mixed in the same tube, washed, and then fused by the 
addition of 50o/o PEG. The fused protoplasts were diluted and plated regeneration medium 
and incubated until the colonies were sporulating (four days). Spores were harvested and 
washed. These spores represent a pool of all the recombinants and parents form the fusion. 
A sample of the pooled spores was then used to inoculate a single liquid culture. The culture 
was grown to early stationary phase, the myclelia harvested, and protoplasts prepared. 10 8 
protoplasts from this "mycelial library" were then fused with themselves by the addition of 
50%PEG. The protoplast fosio^regeneratior^harvesting/protoplast preparation steps were 
repeated two times. The spores resulting from the fourth round of fusion were considered the 
"whole genome shuffled library" and they were screened for the frequency of the 16 possible 
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combinations of the four markers. The results from each round of fusion are shown figure 33 
and in the following table. 

The results of the shuffling procedure are set forth in Figure 33. In particular, 
adding rounds of recombination prior to selection produced significant increases in the number 
5 of clones which incorporated all four of the relevant selectable markers, indicating that the 
population became increasingly diverse be recursive pooling and sporulation. Additional 
results are set forth in the following table. 
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The four strains of the four parent shuffling were each auxotrophic for three 
and prototrophic for one of four possible nutritional maricers: arginine (A), cystine (C), proling 
(P), and/or uracil (U). Spores from each fusion were plated in each of the 16 possible 
combinations of these four nutrients, and the percent of the population growing on a 
5 particulate medium was calculated as the ration of those colonies form a selective plate to 
those growing on a plate having all four nutrients (all variants grow on the medium having all 
four nutrients, thus the colonies from this plate tus represent the total viable population). The 
corrected percentages for each of the no, one, two, and three marker phenotypes were 
determined by subtracting the percentage of cells having additional markers that might grow 
10 on the medium having ''unnecessary" nutrients. For example, the number of colonies growing 
on no additional nutrients (the prototroph) was subtracted from the number of colonies 
growing on any plate requiring nutrients. 

II WHOLE GENOME SHUFFLING THROUGH ORGANIZED 
HETERODUPLEX SHUFFLING 
15 A new procedure to optimize phenotypes of interests by heteroduplex shuffling 

of cosmids libraries of the organism of choice, is provided. This procedure does not require 

protoplast fusion and is applicable to bacteria for which well-established genetic systems are 

available, including cosmid cloning, transformation, in vitro packaging/trarisfection and 

plasmid transfer/mobilization. Microorganism that can be improved by these methods include 

20 Escherichia coli, Pseudomonas aeruginosa, Pseudomottas putida, Pseudomonas spp t 

Rhizobium spp. t Xanthomonas spp, and other gram-negative organisms. This method is also 
applicable to Gram-positive microorganisms. 

A basic procedure for whole genome shuffling ; through organized heteroduplex, 
shuffling is set forth in Figure 34. 

25 In step A, Chromosomal DNA of the organism to be improved is digested with 

suitable restriction enzymes and ligated into a cosmid. The cosmid used for cosmid-based 
heteroduplex guided WGS has at least two rare restriction enzyme recognition sites (e.g. Sfr 
and NotI) to be used for linearization in subsequent steps. Sufficient cosmids to represent the 
complete chromosome are purified and stored in 96-well microliter dishes. In step B, small 

30 samples of the library are mutagenized in vitro using hydroxylamine or other mutagenic 
chemicals. In step C, a sample from each well of the mutagenized collection is used to 
transfect the target cells. In step D, the trahsfectants are assayed (as a pool from each 
mutagenized sample-well) for phenotypic improvements. Positives from this assay indicate 
that a cosmid from a particular well can confer phenotypic improvements and thus contain 
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large genomic fragments that are suitable targets for heterdduplex mediated shuffling. Instep 
E, the transfected ceUs harboring a mutant library of the identified cosmid(s) are separated by 
plating on solid media and screened for independent mutants conferring an improved 
phenotype. In step F, DNA from positive cells is isolated and pooled by origin. InstepG,the 
selected cosmid pools are divided so that one sample can be digested with Sfr and the other 
with NotI These samples are pooled/denatured, reannealed, and religated. 

In step H, target cells are transfected with the resulting heteroduplexes and 
propagated to all6w«recombination» to occur between the strands of the heteroduplexes in 
vivo. The transfectahts can be screened (the population will represent the pairwise . 
recombinants) or, commonly, as represented by step I, the recombined cosmids are further 
shuffled by recursive in vitro heteroduplex fonnation and in vivo recombination (to generate a 
complete combinatorial library of the possible mutations) prior to screening. An additional 
• - mutagenesis step could also be added- for increased diversity during the shufflmg process. 

1°. step h once several cosmids harboring different distributed loci have been 
15 improved, they are combined into the same host by cnromosome integration. This organism 

can be used directly or subjected to a new round of heteroduplex guided whole genome 
shuffling. 

EXAMPLES 

The €oUowing exam P les arc offered to illustrate, but noqo limit the present . 

apparent to one of skill upon review of the present disclosure. 

A. EXAMPLE 1: EVOI.VTNO HYPER-RFrOMBINOOFTsnr rppa 
RecA protein is implicated in most K. coli homologous recombination 
pathways. Most mutations in recA inhibit recombination, but some have been reported to 
increase recombination (Kowalczykowski et al., Microbiol. Rev.,5S, 401-465 (1994)). The 
following example describes evolution of RecA to acquire hyper-recombinogenic activity 
useful in in vivo shuffling formats. 

Hyperrecombinogenic RecA was selected using a modification of a system 
developed by Shen et aL, Genetics 1 12, 441-457 (1986); Shen et al., Mol. Gen. Genet. 218, 
358-360 (1989)) to measure the effect of substrate length and homology on recombination 
frequency. Shen & Huang's system used plasmids and bacteriophages with small (3 1-430 bp) 
regions of homology at which the two could recombine. In a restrictive host, only phage that 
had incorporated the plasmid sequence were able to form plaques 
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For shuffling of recA, endogenous recA and mutS were deleted from host 
strain MCI 061 . In this strain, no recombination was seen between plasmid and phage. E. colt 
recA was then cloned into two of the recombination vectors (Bp221 and 7tMT63 lcl8). 
Plasmids containing cloned RecA were able to recombine with homologous phage:XV3 (430 
5 bp identity with Bp221),XV13 (430 bp stretch of 89% identity with Bp221) and Xlink H (31bp 
identity with 7tMt63 1 cl 8, except for 1 mismatch at position 18). 

The cloned RecA was then shuffled in vitro using the standard DNase- 
treatment followed by PCR-based reassembly. Shuffled plasmids were transformed into the 
non-recombining host strain. These cells were grown up overnight, infected with phage XVc, 
10 XV 13 or Xlink H, and plated onto NZCYM plates in the presence of a 10-fold excess of 

MCI 061 lacking plasmid. The more efficiently a recA allele promotes recombination between 
plasmid and phage, the more highly the allele is represented in the bacteriophage DNA. 
Consequently, harvesting all the phage from the plates and recovering the recA genes selects 
for the most recombinogenic recA alleles. 
15 Recombination frequencies for wild type and a pool of hyper-recombinogenic 

RecA after 3 rounds of shuffling were as follows: 

Cross * Wild Tvpe Hyper Recom 

BP221 x V3 6.5 x 10" 4 3.3 x 10* 2 

BP221 x V13 2.2 x 10" 5 1.0 x 10* 3 

20 ff MT631cl8xlinkH 8.7X10" 6 4.7xl0* 5 n 

These results indicate a 50-fold increase in recombination for the 430 bp substrate, and a 5- 

fold increase for the 3 1 bp substrate. 

The recombination frequency between BP221 and V3 for five individual clonal 
25 isolates are shown below, and the DNA and protein sequences and alignments thereof are 

included in Figs. 12 and 13. 

Wildtype: 1.6 xlO" 4 

Clone 2: 9.8 x 10* 3 (61 x increase) 

Clone 4: 9.9 x 10' 3 (62 x increase) 
30 Clone 5: 6.2 x 10* 3 (39 x increase) 

Clone 6: 8.5 x 10' 3 (53 x increase) 

Clone 13: 0.019 (1 16 x increase) 

Clones 2, 4, 5, 6 and 13 can be used as the substrates in subsequent rounds of shuffling, if 
further improvement in recA is desired. Not all of the variations from the wildtype recA 
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sequence necessarily contribute to the hyperrecombinogenic phenotype. Silent variations can 
be eliminated by backcrossing. Alternatively; variants of recA incorporating individual points 
of variation from wildtype at codons 5, 18, 156, 190, 236, 268, 271, 283, 304, 312, 317, 345 
and 353 can be tested for activity. 

5 B. EXAMPLE 2: WHOLE O RGANISM EVOLUTION FO P hvdc 
RECQMR IN A TTOTSJ ^mJ^MLUYPER- 

The possibility of selection for an. JS. coli strain - with an increased level of 
recombination was indicated from phenotypes of wild-type, ArecA, mutSand ArecA mutS 
strains following exposure to mitomycin C, an inter-strand cross-linking agent of DNA. 
10 Exposure of £ co// to mitomycin C causes mter-strand cross-linking of DNA 

thereby blocking DNA replication. Repair of the inter-strand DNA cross links in.£ coli 
occurs via a RecA-dependent recombinational repair pathway (Friedberg et al:; Z)^4 /? ej pa/r 
and Mutagenesis (1995) pp. 191-232). Processing of cross-links during repair results in 
occasional double-strand DNA breaks, which too are repaired by a RecA-dependent 
recombinational route. Accordingly, recA" strains are significantly more sensitive than 
wildtype strains to mitomycin C exposure. In fact, mitomycin C is used in simple disk- 
sensitivity assays to differentiate between RecA + and RecA" strains. 

In addition to its recombinogenic properties, mitomycin C is a mutagen 
Exposure to DNA damaging agents, such as mitomycin.C, typically results in the induction of 

damage (Friedberg et al., 1995, supra, at pp. 465-522). 

Following phage P I -mediated generalized transduction of the AirecA- 
srI)::Tn\0 allele (a nonfunctional aUele) into wild-type and mutS E. coli, tetracycline-resistant 
transductants were screened for a /■^ phenotype using the mitomycin ^sensitivity assay. It 
was observed in LB overlays with a 1/4 inch filter disk saturated with 10 ug of mitomycin C 
following 48 hours at 3TC, growth of the wUd-type and mutS strains was inhibited within a 
region with a radius of about 1 0 mm from the center of the disk. DNA cross-Unking at high 
levels of mitomycin C saturates recombinational repair resulting in lethal blockage of DNA 
replication. Both strains gave rise to occasional colony forming units within the zone of 
inhibition, although, the frequency of colonies was ~1 0-20-fold higher in the mutS strain. This 
is presumably due to the increased rate of spontaneous mutation of mutS backgrounds. A 
side-by-side comparison demonstrated that the ArecA and A recA mutS strains were 
significantly more sensitive to mitomycin C with growth inhibited in a region extending about 
15 mm from the center of the disk. However, in contrast to the recA* strains, no Mit r 



138 



BNSDOCID: <WO 0OO4 190A1_IA> 



WO 00/04190 PCT/US99/15972 
individuals were seen within the region of growth inhibition-not even in the mutS background. 
The appearance of Mit r individuals in recA* backgrounds, but not in ArecA backgrounds 
indicates the Mit r is dependent upon a functional RecA protein and suggests that Mit r may 
result from an increased capacity for recombinational repair of mitomycin C-induced damage. 
5 Mutations which lead to increased capacity for RecA-mediated recombinational 

repair may be diverse, unexpected, unlinked, and potentially synergistic. A recursive protocol 
alternating selection for Mit r and chromosomal shuffling evolves individual cells with a 
dramatically increased capacity for recombination. 

The recursive protocol is as follows. Following exposure of a mutS strain to 

10 mitomycin C, Mit r individuals are pooled and cross-bread [e.g;, via Hfr-mediated 

chromosomal shuffling or split-pool generalized transduction, or protoplast fusion). Alleles 
which result in Mit r and presumably result in an increased capacity for recombinational repair 
are shuffled among the population in the absence of mismatch repair. In addition, error-prone 
repair following exposure to mitomycin C can introduce new mutations for the next round of 

1 5 shuffling. The process is repeated using increasingly more stringent exposures to mitomycin 
C. A number of parallel selections in the first round as a means of generating a variety of 
alleles. Optionally, recombinogencity of isolates can be monitored for hyper-recombination 
using a plasmid x plasmid assay or a chromosome x chromosome assay (e.g., that of Konrad, 
J.Bacteriol 130, 167-172(1977)). 

20 C. EXAMPLE 3: WHOLE GENOME SHUFFLING OF STREPTOKfYCES 

COELICOLOR TO IMPROVE THE PRODUCTION OF Y 
-ACTINORHODIN. . ; 

To improve the production of the secondary metabolite 7-actinorhodih from S. 

coelicolor, the entire genome of this organism is shuffled either alone or with its close relative 

25 S. lividans. In the first procedure described below, genetic diversity arises from random 

mutations generated by chemical or physical means. In the second procedure, genetic 

diversity arises from the natural diversity existing between the genomes of S. coelicolor and S. 

lividans. 

Spore suspensions of S. coelicolor are resuspended in sterile water and 
30 subjected to UV mutagenesis such that 1% of the spores survive (—600 "energy" units using a 
Stratalinker, Stratagene), and the resulting mutants are "grown out" on sporulation agar. 
Individual spores represent uninucleate cells harboring different mutations within their 
genome. Spores are collected, washed, and plated on solid medium, preferably soy agar, R5, 
or other rich medium that results in sporulating colonies. Colonies are then imaged and picked 
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randomiy using an automated colony picker, for example the Q-bot (Genetix). Alternatively 
colonies producing larger or darker halos of blue pigment are picked in addition or 
preferentially. 

The colonies are inoculated into 96 well microtitre plates containing 1/3 x 
YEME medium (170ul /well). Two sterile 3mm glass beads are added to each well, and the 
plates are shaken at 150-250 rpm at 30 d C in a humidified incubator: The plates are incubated 
up to 7 days and the cell supernaterits are assayed for y^ctinorhodin production. 

To assay, 50uL of supernatant is added to 1 OOuL of distilled water in a 96 well 
polypropylene microtitre plate, and the plate is centrifuged at 4000 rpm to pellet the mycelia. 
50 uL of the cleared supernatant is then removed and added to a flat bottom polystyrene 96 
well microtitre plate containing 150 pJL 1M KOH in each well. The resulting plates are then 
read in a microtitre plate reader measuring the absorbance at 654 nm of the individual samples 
as a measure of the content y-actinorhodin. 

Myceuafromculto^ 
than that of wildtype S. coelicolor are then isolated. These are propagated on solid 
sporulation medium, and spore preparations of each improved mutant are made. From these 
preparations protoplasts of each of the improved mutants are generated, pooled together, and 
fused (as described in Genetic Manipulat ion of Strep to mvces -A laboratory Manual 
Hopwood, D:A., et al ). The fused protoplasts are regenerated and allowed to sporulate 
PltSjpeliM 

to increase the representation of multiparent progeny, are used to generate protoplasts and 
fused again (or several times as described previously for methods to effect poolwise 
recombination) before further picking and screening. 

Further improved mutants result from the combination of two or more 
5 mutations that have additive or synergistic effects on g-actinorhodin production. Further 
improved mutants can be again mated by protoplast poolwise fusion, or they can be exposed 
to random mutagenesis to create a new population of cells to be screened and mated for 
further improvements. 

As an alternative to random mutagenesis a source of genetic diversity, natural 
diversity can be employed. In this case, protoplasts generated from wildtype S. coelicolor and 
S. lividans are fused together. Spores from the regenerated progeny of this mating are then 
either repetitively fused and regenerated to create additional diversity, or they are separated on 
solid medium, picked, and screened for enhanced production of g-actinorhodiri: As before, the 
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improved subpopulation are mated together to identify further improved family shuffled 
organisms. 

D. EXAMPLE 4: A HIGH THROUGHPUT ACTINORHODIN ASSAY 
Additional Details on a high-throughput shuffling actinorhodin assay used to 

5 select mycelia are set forth in Figure 32. In brief, shufflants were picked by standard 

automated procedures using a Q-bot robotic system and transferred to standard 96 well plates. 

After incubation at 30°C for 7 days, the resulting mycelia were centrifuged, and a sample of 

cell supernatant was removed and mixed with 0. 1 M KOH in a 96 well plate and the 

absorbance read at 654nm. The best positive clones were selected and grown in shake flasks. 

10 Approximately 10 9 protoplasts were centrifuged at 3,000rpm for 7 min. When 

more than one strain was used, equal number of protoplasts were obtained from each strain. 
Most of the buffer was removed and the pellet suspended in the remaining buffer (-25^1 total 
volume) by gentle flicking. 0.5ml of 50% PEG1000 was added and mixed with the protoplasts 
by gently pipetting in and out 2 times. The mixture was then incubated for 2 minutes. 0.5ml 

15 of P buffer was added and gently mixed. (This is the fusion at a dilution of 10" *). A ten-fold 
serial dilution was performed in P buffer. After 2 minutes, dilutions were plated at 10' 1 , 10* 2 
and 10' 3 onto R5 plates with 50^1 of each, 2" 3 plates each dilution, (for plating, -20 of 3mm 
glass beads were used, gentle shaking). As a first control, for regeneration of protoplasts, the 
same number of protoplasts were used as above, adding P buffer to a total of 1ml (this is the 

20 regeneration at dilution 10" 1 ). The mixture was further diluted (10X) in P buffer. The 
dilutions were plated at 10' 3 , 10" 4 and 10" 5 onto R5 plates with 50ul of each. As a second 
control, (as a non-protoplasting mycelia background check) the same number of protoplasts as 
above were used adding 0.1% SDS to a total of 1ml (this is the background at dilution 10" 1 ). 
After further 10X dilution in 0. 1% SDS, the dilution was plated at 10" 1 , 10* 2 and 10" 3 onto R5 

25 plates with 50pl of each. The plates were air dried and Incubated at 30°C for 3 days. 

The number of colonies was counted from each plate (those that were 
) ■ " ■ *' 

countable), using the number of regenerated protoplast as 100% and calculating the 

percentage of background (usually less than one) and fusion survival (usually greater than 10). 

The fusion plates were incubated at 30°C for 2 more days until all colonies were well 

30 sporulated. Spores were harvested from those plates having less than 5,000 colonies. Spores 

were filtered through cotton and washed once with water, suspended in 20% Glycerol and 

counted. Those spores are used for further study, culture inoculation or simply stored at - 

20*C 
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E EXAMPLE 4: WHOLE GENOME SHI JFFLINr, OF BHhnn^ rnr 

FOR TWO-PH ASE REACTTOM CATALYSTS ~~~~ ^^^^ 

This example provides an example of how to apply the techniques described 
herein to technologies that allow the generic improvement of biotransformations catalyzed by 
whole cells, Rhodococcus was selected as an initial target because it is both representative of 
systems in which molecular biology is rudimentary (as , is common in whole cell catalysts which 
are generally selected organism that 

can catalyze two-phase reactions.^ 

The goal of whole genome shuffling of Rhodococcus is to obtain an increase in 
flux through any chosen pathway. The substrate specificity of the pathway can be altered to 
accept molecules which are not currently substrates. Each of these features can be selected for 
during whole genome shuffling. 

During whole genome shuffling, libraries of shuffled enzymes and pathways are 
made and transformed into Rhodococcus and screened, preferably by high-throughput assays 
for improvements in the target phenotype, e.g., by mass spectroscopy for measuring the 
product. 

As noted above, the chromosomal context of genes can. have dramatic effects 
•on their activities.: Cloning of the target genes onto a small pksmid in Rhodococcus can. 
dramatically reduce the overall pathway activity (by a factor of 5- to lO-fold or more). Thus, 

activity of wild-type strain. By contrast, integration of the genes into random sites in the 
Rhodococcus chromosome can result in a significant (5- to J 0-fold) increase in activity. A 
similar phenomenon was observed in the recent directed evolution in EcoU of an arsenate 
resistance operon (originally from Staphylococcus aureus) by DNA shuffling* Shuffling of this 
plasmid produced sequence changes that led to efficient integration of the operon into the£ 
coli chromosome. Of the total 50-fold increase in arsenate resistance obtained by directed 
evolution of the three gene pathway, approximately 10-fold resulted from . this integration into 
the chromosome. The position within the chromosome is also likely to be important: for 
, example sequences close to the replication origin have an effectively higher gene dosage and 
30 therefore greater expression level. 

_ In order to folly exploit unpredictab|e chromosomal : posropn effects," and to' 
incorporate them into a directed evolution strategy which utilizes multiple cycles of mutation, 
recombination and selection, genes are manipulated in vitro and then transferred to an optimal 
chromosomal position. Recombination between plasmid and 'chromosome occurs in two 

142 , 
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different ways. Integration takes place at a position where there is significant sequence 
homology between plasmid and chromosome, i.e., by homologous recombination. Integration 
also takes place where there is no apparent sequence identity, i.e., by non-homologous 
recombination. These two recombination mechanisms are effected by different cellular 
5 machineries and have different potential applications in directed evolution. 

To combine the increase in activity that resulted from gene duplication and 
chromosomal integration of the target pathway with the powerful technique of DNA shuffling, 
libraries of shuffled genes are made in vitro, and integrated into the chromosome in place of 
the wild-type genes by homologous recombination. Recombinants are then be screened for 

10 increased activity. This process is optionally made recursive as discussed herein. The best 
Rhodococcus variants are pooled, and the pool divided in two. Genes are cloned out of the 
pool by PCR, shuffled together and re-integrated into the chromosomes of the other half of the 
pool by homologous recombination. Recombinants are once again be screened, the best taken 
and pooled and the process optionally repeated. 

15 Sometimes there are complex interactions between enzymes catalyzing 

successive reactions in a pathway. Sometimes the presence of one enzyme can adversely 
affect the activities of others in the pathway. This can be the result of protein-protein 
interactions, or inhibition of one enzyme by the product of another, or an imbalance of primary 
or secondary metabolism. 

20 This problem is overcome by DNA shuffling, which produces solutions in the 

target gene cluster that bring about improvements in whatever trait is screened. An alternative 
approach, which can solve not only this problem, but also anticipated fiiture rate limiting steps 
such as supply of reducing power and substrate transportation, is complementation by 
overexpression of other as yet unknown genomic sequences. 

25 A library of Rhodococcus genomic DNA in a multicopy Rhodococcus vector 

such as pRGl is first made. This is transformed into Rhodococcus and transformants are 
screened for increases in the desired phenotype. Genomic fragments which result in increased 
pathway activity are evolved by DNA shuffling to further increase their beneficial effect on a 
selected property. This approach requires no sequence information, nor any knowledge or 

30 assumptions about the nature of protein or path way interactions, or even of the rate -limiting 
step; it relies only on detection of the desired phenotype. This sort of random cloning and 
subsequent evolution by DNA shuffling of positively interacting genomic sequences is 
extremely powerful and generic. A variety of sources of genomic DNA are used, from 
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isogenic strains to more distantly related species with potentially desirable properties. In 
addition, the technique is, in principle, applicable to any microorganism for which the 
molecular biology basics of transformation and cloning vectors are available, and for any 
property which can be assayed, preferably in a high-throughput format. 

Homologous recombination within the chromosome is used to circumvent the 
limitations of plasmid-evplution and size restrictions, and is optionally used to alter central 
metabolism. The strategy is similar to that described above for shuffling genes within their 
chromosomal context, except that no in vitro shuffling occurs. Instead, the parent strain is 
treated with mutagens such as ultraviolet light or nitrosoguanidine, and improved mutants are 
selected. The improved mutants are pooled and split. Half of the pool is used to generate 
random genomic fragments for cloning into a homologous recombination vector. Additional 
genomic fragments are derived from related species with desirable properties (in this case 
higher metabolic rates and the ability to grow on cheaper carbon sources). The cloned 
genomic fragments are homologously recombined into the genomes of the remaining half of 
the mutant pool, and variants with improved phenotypes are selected. These are subjected to 
a further round of mutagenesis, selection and recombination. Again this process is entirely 
generic for the improvement , of any whole cell biocatalyst for which a recombination vector 
and an assay can be developed. Recursive recombination can be performed to increase the 
, ^^sity .of t. he .P 00 !at any step in the process.. .... ' .-..IJ-l'-^-'l. . ^. 

~ EffiSenFh^^ 

chromosomal evolution strategies outlined above. Non-homologous recombination results in 
a futile integration (upon selection) followed by excision (following counterselection) of the 
entire plasmid. Alternatively, if no counter-selection were used, there is integration of more 
and more copies of plasmid / genomic sequences which is both unstable and also requires an 
additional selectable marker for each cycle. Furthermore, additional non-homologous 
recombination will occur at random positions and may or may not lead to good expression of 
the integrated sequence. 

. F. EXAMPLE 5: INCREA SING THF. RATE OF HOMOLOGOUS 

RECOMBINATION IN RHODOCOCCTIS 

A genetic approach is used to increase the rate of homologous recombination 
in Rhodococcus. Both targeted and non-targeted strategies.to evolve increases in homologous 
recombination are used. Rhodococcus recA is evolved by DNA shuffling to increase its ability 
to promote homologous recombination within the chromosome. The recA gene was chosen 
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because there are variants of recA known to result in increased rates of homologous 
recombination in E colL as discussed above. 

The recA gene from Rhodococcus is DNA shuffled and cloned into a plasmid 
that carries a selectable marker and a disrupted copy of the Rhodococcus homolog of the S 
5 cerevisiae URA3 gene (a gene which also confers sensitivity to the uracil precursor analogue 
5-fluoroorotic acid). Homologous integration of the plasmid into the chromosome disrupts 
the host uracil synthesis pathway leading to a strain that carries the selectable marker and is 
also resistant to 5-fluoroorotic acid. The shuffled recA genes is integrated, and can be 
amplified from the chromosome, shuffled again and cloned back into the integration-selection 

10 vector. At each cycle, the recA genes promoting the greatest degree of homologous 

recombination are those that are the best represented as integrants in the genome. Thus a 
Rhodococcus recA with enhanced homologous recombination-promoting activity is evolved. 

Many other genes are involved in several different homologous recombination 
pathways, and mutations in some of these proteins may also lead to cells with an increased 

15 level of homologous recombination. For example mutations in E coli DNA polymerase HI 
have recently been shown to increase RecA-independent homologous recombination. 
Resistance to DNA cross-linking agents such as nitrous acid, mitomycin and ultraviolet are 
dependent on homologous recombination. Thus, increases in the activity of this pathway 
result in increased resistance to these agents. Rhodococcus cells are mutagenized and selected 

■t 

20 for increased tolerance to DNA cross-linking agents. These mutants are tested for the rate at 
which a plasmid will integrate homologously into the chromosome. Genomic libraries are 
prepared from these mutants, combined as described above, and used to evolve a strain with * 
even higher levels of homologous recombination. 

The foregoing description of the preferred embodiments of the present 

25 invention has been presented for purposes of illustration and description. They are not 

intended to be exhaustive or to limit the invention to the precise form disclosed, and many 
modifications and variations are possible in light of the above teaching. Such modifications 
and variations which may be apparent to a person skilled in the art are intended to be within 
the scope of this invention. All patent documents and publications cited above are 

30 incorporated by reference in their entirety for all purposes to the same extent as if each item 
were so individually denoted. 
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WHAT IS C LAIMRD TS 

1 1 * A mGth od of producing a library of diverse multicellular organsims, the 

2 method comprising: 

providing a pool of male gametes and a pool of female gametes, wherein at least one of 
4 the male pool or the female pool comprises a plurality of different gametes derived from different 
strains of a species or different species, wherein the male gametes fertilize the female gametes 

permitting at least a portion of the resulting fertilized. gametes to grow into reproductively 



3 



5 
6 

7 • viable organisms; 
8 



repeatedly crossing the reproductively viable organisms to produce a library of diverse 
9 organisms; and, . 

selecting the library for a desired trait or property. 

2. The method of claim 1, wherein the library of diverse organisms comprise a 



10 



1 2. 

2 plurality of plants. 



1 3 * ™ emethodofcI ^^ 

2 F^coideae, PoacoideaeMgrostis, Phleum, Dactyl^ 

3 Secale.Avena, Hordeum, Saccharum, Poa, Festuca, Stenotaphrum, Cynodon. Coix, Olyreae, 

4 - Phareae, Compositae, and Leguminosae . 



4. The method of claim 2, wherein the plants are selected from corn), rice, 

2 wheat, rye, oats, barley, pea, beans, lentil, peanut, yam bean, cowpeas, velvet beans, soybean, 

3 clover, alfalfa, lupine, vetch, lotus, sweet clover, wisteria, sweetpea, sorghum, millei, sunflower, 

4 and canola. . * • ' y 

1 5 ' The method of claim 1, wherein the hbraiy of diverse organisms comprise a 

2 plurality of animals. 

1 6 ' The method of claim 5, wherein the animals are selected from non-human 

2 mammals and fish. 

1 7. The library produced by the method of claim 1: 
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1 8. The method of claim 1, further comprising: 

2 crossing a plurality of selected library members by pooling gametes from the selected 

3 members and repeatedly crossing any resulting additional reproductively viable organisms to 

4 produce a second library of diverse organisms; and, 

5 selecting the second library for a desired trait or property. 



1 9. The second library made by the method of claim 8. 

1 10. A method of evolving a cell to acquire a desired property, comprising: 

2 (i.) forming protoplasts of a population of different cells; 

3 (ii.) fusing the protoplasts to form hybrid protoplasts, in which genomes from the 

4 protoplasts recombine to form hybrid genomes; 

5 (iii.) incubating the hybrid protoplasts under conditions promoting regeneration of 

6 cells, thereby producing regenerated cells; 

7 (iv.) repeatedly forming protoplasts from the regenerated cells, fusing the 

8 protoplasts to form hybrid protoplasts, in which genomes from the protoplasts recombine to form 

9 additional hybrid genomes; incubating the additional hybrid protoplasts under conditions 

10 promoting regeneration of cells, thereby producing additional regenerated cells; and, , 

1 1 (v.) selecting or screening to isolate regenerated cells or additionally regenerated 

12 cells that have evolved toward acquisition of the desired property. 

1 11. The method of claim 10, wherein the desired property is selected from: heat 

T tolerance, ethanol production, ethanol tolerance, acid, improved production and maintanance of 

3 enzyme cofactors, improved production and maintanance of NAD(P)H, and improved glucose 

4 transport. 

1 12. The method of claim 10, further comprising repeating steps (i.)-(v.) with 

2 regenerated cells in step (iii.) or additional regenerated cells in step (iv.) being used to form the 

3 protoplasts in step (i.) until the regenerated cells have acquired the desired property. 

1 13. The method of claim 10, comprising step (iv), wherein step (iv). is performed 

2 prior to step (v.). 
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1 14- The method ofclaim 10, wherein the hybrid protoplasts comprise ceH 

1 more than two parental genomes. 

\ 15< The ^ ethod ^ 

2 regenerated cells are fungi mycelia. 



1 



2 or spores with an enzyme 



16. The method ofclaim 15, wherein protoplasts are provided by treating mycelia 



1 

2 

1 



17.. The method ofclaim 15, wherein the fungal cells are from a fragile strain, 
lacking capacity for intact ceil wall synthesis, whereby protoplast form spontaneously: 

18. The method of claim 15, fixrther comprising treating the mycelia with an 



2 inhibitor of cell wall formation to generate protoplasts 



1 

2 

1 



19. The method ofclaim 10, further comprising selecting or screening to isolate 
regenerated cells with hybrid genomes free from cells with parental genomes. 



.... .c , 



20. The method ofclaim 10, wherein a first subpopulation of cells contain a first 
marker and the second subpopulation of cells contain a second marker, and the method further ' 

3 



21. The method ofclaim 10, wherein the first marker is a membrane marker and 
2 the second marker is a genetic marker. 



22. The method ofclaim 10, wherein the first marker.is a first suburiit of a 
2 heteromeric enzyme and the second marker is a second subunit of the heteromeric enzyme. 

1 23. The method of claim 10, further comprising transforming protoplasts with 

2 library of DNA fragments in at least one cycle. / 



24. The method ofclaim 23, wherein the DNA fragments are accompanied by 
2 restriction enzyme. 
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1 25. The method of claim 10, further comprising exposing the protoplasts to 

2 ultraviolet irradiation in at least one cycle. 

1 26* The method of claim 10, wherein the desired property is the expression of a 

2 protein, primary metabolite, or secondary metabolite. 

1 27. The method of claim 10, wherein the desired property is the secretion of a 

2 protein or secondary metabolite. 

1 28. The method of claim 27, wherein the secondary metabolite is selected from 

2 taxol, cyclosporin A, and erythromycin. 

1 29* The method of claim 10, wherein the desired property is capacity for meiosis. 

1 30. The method of claim 10, wherein the desired property is compatibility to form 

2 a heterokaryon with another strain. 

1 31. The method of claim 10, further comprising exposing the protoplasts or 

2 mycelia to a mutagenic agent in at least one cycle. 

1 32. A method for whole genome shuffling through organized heteroduplex 

2 shuffling, the method comprising: ■* 

3 (a), providing chromosomal DNA of an organism which is targeted for shuffling, 

4 digesting the chromosomal DNA with one or more restriction enzymes, ligating the chromosomal 

5 DNA into a cosmid, the cosmid comprising at least two rare restriction enzyme recognition sites, 

6 aliquoting, purifying, and storing sufficient cosmids to represent a complete chromosome; 

7 (b). mutagenizing aliquots of the library in vitro using a mutagen; 

8 (c). transfecting a sample from a plurality of the mutagenized aliquots into a population of 

9 target cells; 

10 (d). assaying resulting transfectants for phenotypic improvements; 

1 1 . (e). growing transfected cells harboring a mutant library of the identified cosmid(s) on 

12 media and screening the resulting cell colonies for independent mutants conferring an desired 

13 phenotype; 
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14 (f). isolating and pooling DNA from cells identified in the screening; 

J* (g). dividing the selected pools and digesting at least one sample with a rare-cutting 

6 restriction enzyme, pooling the cleaved '^to^toun^^^^^ 

17 and rehgatmg the samples; and, P 

18 (h). transfecting target cells with the resulting heteroduplexes and propagating the ceUs to 

19 allow recomb.nat.on to occur between the strands of the heteroduplexes in vivo. 

1 33. The method of claim 32, further comprising additionally screening the 

2 transfectants & 



1 

2 
3 

1 



34. The method of claim 32, further comprising further shuffling the 
heteroduplexes by recursive in vitro heteroduplex formation and in vivo recombination prior to 
additionally screening the transfectants. 

35. The method of claim 33, further comprising performing an additional 
2 mutageneses step to increase diversity during the shuffling process. 

•J 

1 36. The method of claim 32, further comprising combining one or more 

2 heteroduplexes into a host chromosome by chromosome integration. 

I , K \ . . . ^ . Themet h0d ° fclaim3 6 > forther <^P™"S repeating s t e ps (a),(h)., using 

3 (a). - 1 



(a). 

1 38. The method of claim 32, wherein the cosmid comprises restriction sites for 

2 Sfr or Not! 

1 39. The method of claim 32, wherein the transfectants areassayed as a pool from 

2 each mutagenized aliquot. 



1 
2 

3 



40. The method of claim 32, wherein a positive assay result indicates that a 
cosm.d from a particular aliquot can confer phenotypic improvements and contains large genomic 
fragments that are suitable targets for heteroduplex mediated shuffling. 

41. The method of claim 32, wherein the mutagen is a chemical mutagen. 
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42. The method of claim 32, wherein growing transfected cells harboring a 
mutant library of the identified cosmid(s) on media comprises plating the transfected cells on 
solid media. 
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GCGGTGCGTCGTCAGGCTACTGCGTATGC ATT GC AG ACCTTGTGGC AAC A ATTTCT AC A A AA C ACCTGAT 

150 160 170 180 190 200 210 

New Minshail GCGGTGCGTCGTCAGGCTACTGCGTATGCATTGCAGACCTTGTGGCAAC AATTTCTAC AAAACACTTGAT 2 09 

N*w Clone 2 GCGGTGCGTCGTCAGGCTACTGCGTATGC ATTGCAGACCTTGTGGCAACAATTTCT ACGA A AC ACCTG AT 152 

New Clone 4 GCGGTGCGTCGTC AGGC TACT GCGT AT GC ATTGCAGACCTTGTGGCAACAATTTCT AC AAA AC ACCTG AT 168 

New Clone t GCGGTGCGTCGTCAGGCTACTGCGTATGC ATTGCAGACCTTGTGGCAACAATTTCT AC AAA AC ACCTG AT 149 

New Clone 6 GCGGTGCGTCGTCAGGCTACTGCGTATGCACTGCAGACCTTGTGGC AAC AATTTCTAC A A AACACCTGTT 168 

complete M GCGGTGCGTCGTCAGGCT ACTGTGTATGC ACTGCAGACCTTGTGGC AACGATTTCT AC AAAAC ACTCGAT 167 



New HlnsnaU 
New Clone 2 
New Clone 4 
New Clone 5 
New Clone 6 
• complete 1 J 



ACTGTATGAGCAT AC AGT ATA ATTGCTTC AAC AGA AC AT ATT G ACT AT CCGCTATT ACCCGGCATGACAG 

220 230* 240 250 260 270 280 

ACTGTATGAGCAT ACAGTATAATTGCTTC AAC AGAAC AT ATTGACTATCCGGTATT ACCCGGCATGACAG 27 9 

ACTGTATGAGC AT AC AGTATAATTGCTTCAACAGAAC AT ATTGACT ATCCGGTATT ACCCGGCATGACAG 2 22 

ACTGTATGAGCAT AC AGT AT AATTGCTTC A AC AGAACATATTG ACTATCCGGTATTACCCGGC ATGAC AG 2 38 

ACTGTATGAGCAT ACAGT AT AATTGCTTCGACAGAACAT ATTGACT ATCCGGTATT ACCCGGCATGACAG 2 1 9 

ACTGTATGAGC ATGCAGTATAATTGCTTC AAC AGAACAT ATTGACT ATCCGGTATT ACCCGGCATGACAG 2 38 

ACCGT ATGAGC AC ACAGT AT AATCGCTT CGACAGAACTT ATTGACT ATCCGGTATT ACCCGGCATGACAG 2 37 



G AGTAAAA ATGG CT ATT G AC G AAA AC A A AC AGA A AGCCTTGGCGGC AGCACTGGGCC AGATTG AGAAAC A 

290 300 310 320 330 340 350 

New Ml nsha 1 1 G AGTAAAAATGGCT ATCGACGAAAACAAACAGAAAGCGTTGGCGGC JGCACTGGGCC AGATTG AGAAACA 3 4 9 

New Clone 2 G AGTGAAAATGGCT ATTGACGAAAACAAACAGAAAGCGTTGGCGACAGCACTGGGCCAGATTG AGAAACA 292 

New Clone 4 GAGTAAAC ATGGCTATCGACGAAAACA AACAGAAAGCGTTAGCGGC AGCACTGGGCC AGATTG AGAAAC A 308 

New Clone 5 G AGT AAA A ATGGCTATCGACG AG A AC A AAC AGA AAGCGTTGGCGGC AGCACTGGGCC AGATTG AGAAAC A 28 9 

New Clone 6 GAGTAAAAATGGCTATTGACGAAAAC A AAC AGAAAGCGTTGGCGGCAGCACTGGGCC AGATTG AGAAAC A 306 

complete 13 GAGTAAAAATGGCTATTGACGAAAACAAACAGAAAGCGTTGGCGGC AGCACTGGGCC AGATTG AGAAACA 307 



ft TTTrrfTTR A AfiCCTCC ATC ATGCGCCTOCGTG A AG A CCHTTCr A TGGATGTGGAAACCATf*TCTACCfifiT 

360 370 360 390 400 410 420 

New Minshail ATTTGGTAAAGGCTCCATC ATGCGCCTGGGTGAAGACCGTTCC ATGGATGTGGAAACC ATCTCT ACCGGT 4 19 

New Clon* 2 ATTTGGTAAAGGCTCCATC ATGCGCCTGGGTGAAGACCGTTCCATGGATGTGGAAACC ATCTCT ACCGGT 362 

New Clone 4 ATTTGGTAAAGGCTCCATC ATGCGCCTGGGTGAAGACCGTTCC ATGGATGTGGAAACCATCTCC ACCGGT 3 78 

New Clone 5 ATTTGGTAAAGGCTCCATC ATGCGCCTGGGTGAAGACCGTTCCATGGATGTGGAAACC ATCTCTACCGGT 3 59 

New Clone 6 ATTTGGTAAAGGCTCCATC ATGCGCCTGGGTGAAGACCGTTCC ATGGATGTGGA AACCATCTCTACTGGT 378 

complet" 13 GTTTGGTAAAGGCTCCATC ATGCGCCTGGGGGAAGACCGTTCC ATGGATGTGGAAACC ATCTCTACCGGT 37 7 
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TCGCTTTCACTGG A T ATCGC GCTTGGGGCAGGTGGTCTGCCG AT GGGCCCT AT CGTCGAA ATCT AC Gfl AT 

i ~r i t t i ] 

■ 4 30 4 40 J- 4 50" 4 60 4 70 4 80 4 90 

New Hinshall. TCGCTTTCACTGG ATATCGCGCTTGGGGCAGGTGGTCTGCCGATGGGCCGTATCGTCGAAATCTACGGAC 489 

New Clone 2^ TCGCTTTCACTGG AT ATCGC GCTTGGGGCAGGTGGTCTGCCGATGGGCCGT AT CGTCGAAATCTACGG AC 432 

New Clone 4 TCGC TTTC ACTGG A T ATCGC AC TTGGGGC AG GTGGTCTGCCGATGGGCCGT AT CGTCGAAATCTACGG AC 448 

New Clone 5 . TCGCTTTCACTGG ATATCGCGCTTGGGGCAGGTGGTCTGCCGATGGGCCGTATCGTCGAAATCTACGGAC 429 

New Clone 6 TCGCTTTCACTGGATATCGCGCTTGGGGCAGGTGGTCTGCCGATGGGCCGTATCGTCGAAATCTATGGAC 448 

complete 13 TCGCTTTCACTGGAT ATCGC GCTTGGGGCAGGTGGTCTGCCGATGGGCCGT ATCG TCGAAATCTACGGAC 447 



CGGAATCTTCCGGT AAAACCACGCTC ACGCTGCAGGTGATCGCCGCAGCGC AGCGTG A AGGT A A A APrTf; 

# ■ 1 i . I . ' i | i i 

500 510 .520 530 540 550 560 . 

New Hinshall CGGAATCTTCCGGTAAAACCACGCTGACGCTGCAGGTGATCGCCGCAGCGCAGCGTGAAGGTAAA ACCTG 55 9 
New Clone 2 CGGAATCTTCCGGT AAAACCACACTGACGCTGCAGGTGATCGCCGC AGCGCAGCGTGAAGGTAAAACCTG 502 
New Clone 4 CGGAATCTTCCGGT AAAACCACGCTGACGCTGCAGGTG ATCGCCGCAGCGC AGCGTGAAGGTAAA ACCTG 518 
New Clone 5. CGGAATCTTCCGGT AAA ACCACACTGACGCTGCAGGTG A TCGCCGC AGCGCAGCGTGAAGGTAAAACCTG 499 
New Clone o CGGAATCTTCCGGTAAAACC AC ACT GACGCTGCAGGTG ATCGCCGCAGCGC A GCGTGAGGGTA A A ACCTG 5 18 
complete 13 CGGAATCTTCCGGTAAAACCACGCTGACGCTGCAGGTGATCGCCGC AGCGCAGCGTGAAGGTAAAACCTG 5 17 



T - GCGTTTATCCATGCTGAACACGCGCTGGACCC AATCTACGC ACGTAA ACTGGGCGTCGATATCG ACAA 

570 S80 590 600 1 610 620 630 

New Hinshall T- GCGTTTATCGATGCTG A AC ACGCGCTGGACCC AATCTACGC ACGTAA ACT GGGCGTCG AT ATCG ACAA 628 

New' Clone -2 T- GCGTTT ATCG AT GCCG A AC ACGCGCTGGACCC AATCTACGC AC GC A A ACTGGGCGTCGATATCG AC A A 57 1 

New. Clone A T - GCGTTTATCGATGCTG A AC ACGCGCTGGACCC AATCTACGC ACGTAA ACT GGGCGTCG AT ATCG AC A A 587 

New Clone 5. TTGC GTTT ATC G AT GCTG AAC ACGC GC T AG ACCC A AT CT A CGC A CGTAAACTGGGCGTCG AT ATCG AC A A 5 6 9= 

New Clone.6 T- GCGTTT ATCGATGCTGAACACGCGCTGGACCCAATCT ACGC ACGT A A ACT GGGCGTCG AT ATCG ACAA- 5 8 7 : . 

complete 13 .' T - GCGTTTATCGATGCTG AAC ACGCGCTGGACCCGATCT A CGC ACGTAA ACT GGGCGTCG AT ATCG ACAA 58 6 ' 



New Hinshall 

New- Clonei 2 
New Clone < 
New Clone t. 
New Clone t> . 
complete I 3 ' 



CCTGCTGTGCTCCC AGCCGG AC ACCGGCGAGCAGGC ACTGG AAATCTGTG ACGCCCTGGCGCGTTCTGGC 

640 650- 660 670 ' 680 690 700 

CCTGCTGTGCTCCC AGCCGG AC ACCGGCGAGCAGGC ACTGG A A ATCTGTG ACGC CC TGGC GCGTTC TGGC 698 
CCTGCTGTGCTCCC AGCCGG AC ACCGGCGAGCAGGC ACTGGAAATCTGTGACGCCC TGGC GCGTTC TGGC*'6 4 1 ■ 
CCTGCTGTGCTCCC AGCCCGACACCGGCG AGC AGGC ACTGGA A ATCTGTG ACGCCCTGGCGCGTTCTGGC - 65 7 
CCTGCTGTGCTCCC AGCCGGAC AC CGGCGAGCAGGC ACTGGA A ATCTGTG ACGCCCTGGCGCGTTCTGGC 639 
CCTGCTGTGCTCCC AGCCGG AC ACCGGCGAGCAGGC ACTGG A A ATCTGTG ACGC CC TGGC GCGTTC TGGC 65 7 
CCTGCTGTGCTCCC AGCCGGACACCGGCGAGCAGGCACTGGAAATCTGTGACGCCCTGGCGCGCTC TGGC 6 56 



New Hinshall 
New Clone 2 
New Clone 4 
New. Clone I 
New Clone C 
complete 13 



GCAGTAGACGTTATCGTCGTTGACTCCGTGGCGGCACTG ACGCCGAAAGCGG AAATCGAAGGCGAAATCG 
I 1 ! I I l 1 T 

710 720 730 740 750 760 770 

GCAGTAGACGTTATCGTCGTTGACTCCGTGGCGGCACTGACGCCGAAAGCGG AAATCGAAGGCGAAATCG 7 68 

GC AGT AG ACGTTATCGTCGTTGACTCCGTGGCGGC ACTG ACGCCGAAAGCGG AAATCGAAGGCGAAATCG 711 

GCGGT AG ACGTTATCGTCGTTGACTCCGTGGCGGC AC TG ACGCCGAAAGCGG AAATCGAAGGCGAAATCG 7 2 7 

GCAGTAGACGTTATCGTCGTTGACTCCGTAGCGGC ACTG ACGCCGAAAGCGG AAATCGAAGGCGAAATCG 7 09 

GCTGTAGACGTTATCGTCGTTGACTCCGTGGCGGCACTGTCGCCGAAAGCGG AAATCGAAGGCGAAATCG 7 27 

GCAGTGG ACGTTATCGTCGTTGACTCCGTGGCGGC ACTGACGCCGAAAGCGG AAATCGAAGGCGAAATCG 72 6 



GPnarTrTrArATGGGCCTjnCGnCACGTATGATnAGCCAGnCGATGrGJAAGCTGGCGGGTfiftrrTfiftjH 

760 790 800 810 820 830 8*0 

New Hinshall GCGACTCTCACATGGGCCTTGCGGCACGT ATGATGAGCC AGGCG ATGCGTAAGCTGGCGGGT AACCTGAA 838 

New Clone 1 GCGACTCTC AC ATGGGCCTTGCGGC ACGTATGATGAGCC AGGCGATGCGC AAGCTGGCGGGTAACCTGAA 781 

New Clone 4 GCGACTCTC AC ATGGGCCTTGCGGC ACGT ATGATGAGCC AGGCG ATGCGTAAGCTGGCGGGT AACCTGAA 797 

New Clone 5 GCGACTCTC AC ATGGGCCTTGCGGC ACGT ATGATGAGCC AGGCG ATGCGTAAGCTGGCGGGT A ACCTG A A 7 79 

New Clone 6 GCGACTCTCACATGGGCCTTGCGGCACGT ATGATGAGCC AGGCAATGCGT AAGCTGGCGGGTAACCTGAA 7 97 

complete 13 GCGACTCTC ACATGGGCCTTGC AGC ACGT ATG ATGAGCCAGGCG ATGCGTAAGCTGGCGGGT AACCTGAA 7 96 
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CCAGTCCAA CACGCTGCTGATCTTCATCAACCAGATCCCTATGAAAATTCGTGTGATCTTCGGTA ACCCG 

— ' 1 T 1 1 I — I r 

8S0 860 870 ' 880 890 900 910 

New Mlnshall * GCAGTCC AAC ACGCTGCTGATCTTC ATCA ACC AGATCCGTATGAAAATTGGTGTGATGTTCGGTAACCCG 908 

New Clone 2 GCAGTCC A AC ACGCTGCTGATCTTC ATT AACC AG ATCCGTATGA A AATTGGTGTGATGTTCGGTAACCCG 851 

New Clone 4 GCAGTCCAACACGCTGCTGATCTTCATCAACCAGATCCGTATGAAAATTGGTGTGATGTTCGGTAACCCG §67 

New Clone I GTTGTCCAACACGCTGCTGATCTTTATCAACCAGATCCGTATGAAAATTGGCGTGATGTTCGGTAACCCC 84 9 

N*w Clone 6 GCAGTCC A AC ACGCTGCTGATCTTCATCAACCAGATCCGTATGAA AATTGGTGTGATGTTCGGTAACCCG 8 67 

complete 13 GCAGTCCAACACGCTGCTGATCTTCATCAACCAGATCCGT ATGAAAATTGGTGTGATGTTCGGTAACCCG 866 



GAAAC CACTACCGGTGGTAACGCGCTGAAATTCTACGCCTCTGTTCGTCTCGAC ATCCGTCGTATCGGCG 

i J I 1 i I I 

920 930 940 950 960 970 * 980 

New Minshall GAAACC ACCACCGGTGGTAACGCGCTGAA ATT CTACGCCTCTGTTCGTCTCG AC ATCCGTCGTATCGGCG 918 
New Clone 2 GAAACC ACT ACCGGTGGT A ACGCGCTGAAATTCTACGCCTCCGTTCGTCTCG AC ATCCGTCGTATCGGCG 921 
New Clone 4 GAAACC ACTACCGGTGGTAACGCGCTGAAATTCTACGCCTCTGTTCGTCTCGAC ATCCGTCGTATCGGCG 93 7 
New Clone 5 G A AACC ACC ACCG GTGGT A ACGCGCTGAAATTCT AC GCCTCTGTTCGTCTCG AC ATCCGTCGTATCGGCG 919 
New Clone 6 G A AACC ACC ACCGGTGGT A AC GCGCTGAAATTCTACGCCTCTGTTCGTCTCG AC ATCCGTCGTATCGGCG 937 
complete 13 GAAACCACTACCGGTGGT A ACGCGCTGAAATTCT ACGCCTCTGTTCGTCTCGAC ATCCGTCGTATCGGCG 93 6 



CGGTGAA AG AGGGCGAAAACGTGGTGGGTAGCGA AACCCGCGTGA AAGTGGTGA AG AACA A A ATCGCTGC . 

990 1000 1010 1020 1030 1040 1050 /■ "3£T" 

New Minshall CGGTGAA AG AGGGCGAAAACGTGGTGGGTAGCGA AACCCGCGTGA AAGTGGTGA AG A AC AAA ATCGCTGC 104 6 ^ 
New Clone * CGGTG AAAGAGGGCGAAAACGTGGT GGCTAGCGAAACCCGCCTGAAAGTGGTGAAGA AC AA A ATCGCTGC 99L 
New Clone <• 
New Clone % 

comple?J € 13 • CGGTGAAAGAGGGCGAAAACG 1006 



QCCGTTT A AAC AGCCTG A ATTCCAG ATCCTCT-A CGGCG AAGGT ATC A ACTTCT ACGGCG A 'ACTGCTTG AC ... *Yt 

1060 1070 1080 * 1090 1100 i 1-1-0 11=0 '. "*™ r.'> 

New Hinshall CCCGTTTAAAC AGGCTGAATTCC AGATCCTCTACGGCG AAGGT ATC AACTTCTACGGCGAACTGGTTGAC 1118 . 

N*w ClSne 2 GCCGTTT A A AC AGGCTGAATTCC AGGTCCTCT ACGGCG AAGGT ATC A ACT JCTACGGCGA AC 1 0 61 . :3rar 

New Co"- * GCCGTTTAAAC AGGCTGAATTCC A A AT CCTCTACGGCG AAGGT ATC A ACTTCT ACGGCG A 1077 ■ -Xr^ 

S?w Clone b GCCGTTTAAACAGGCTGAATTCC AG ATCCTCT ACGGCG AAGGT ATC AACTTCTACGGCGAACTGGTTGAC 1059 

New Clone C GCCGTTTAAAC AGGCTGAATTCCAGATCCTCTACGGCGAAGGTATC AACTTCTACGGCGAACTGGTTGAC 1077 £ -\ 

-Apiece H ■ GCCGTTT AA AC AGGCTGAATTCC AA ATCCTCT ACGACG AAGGT ATC AAC TTCTACGGCGAACTGGTTGAC 1076 V^- 



CTGGGCGTAAAAG AGAAGCTG ATCGAG AA AGC ACGCCC GTGGT ACAGCTAC AAAGGTGAG AAC ATCCGTC 

New Kinshail C T GGG C G T A AA*A G AG A AG CTT G A TC G A G A A AGC A G G C G C GTHG G T A C A ^ C VaC A A A G G J £ £ G & <r r P r T C 11 ? 8 
Sew SlSne 2 CTGGGCGTAAAAGAGAAGCTGATCGAGAAAG 



1131 



New Clone 2 CTGGGCGTAAAAGAGAA6CT6ATLOAuAAft«UAUbLoLbibo 114 7 

New Clone 4 . CTGGGCGTAAAAG AGAAGCTGATCGAGAA AGC AGGCGCGTGGTACAGCT AC ^AAGGTGAGAAG ATCGGTC li« ' 

New Clone i CTGGGCGTAAAAGAGAAGCTGATCGAGAAAGCAGGCGCGTGGT ACAGCTAC AAAGGTGAGAAG ATCGGTC ^ 

New Clone 6 CTGGGCGT AAAAG AGAAGCTG ATCGAG AAAGCAGGCGC GTGGT ACAGCTAC AAAGGTGAG A AGGTTGGTC 1147 

complete' 13 CTGGGCGTAAAAG AGAAGCTGATCG AG AAAGCAGGCGC GTGGT ACAGCTAC AAAGGTGAG AAGGCCGGTC 1146 



New Mi nsho 1 1 
New Clone 1 
New C 1 one < 
New Clone z, 
New Clone 6 
complete : 1 



ftnnr;TAAA r .rrftATrrr.ftrTrcrTrnrTr^ 

1200 • 1210' 1220 1230 1240 1250 

AGGGTAAAGCG AATGCGACTGCCTGGCTG AAAG AT AACCCGGAA ACCGCGA AAGAGATCG AG A 
AGGGTAAAGCG A ACGCGACTGCCTGGCTGAAAG AT AATCCGGAAACCGCGAAAGAG ATTGAG A 
AGGGTAAAGCG AATGCGACTGCCTGGCTG AAAG AT AACCCGGAAACCGCGA AAGAGATCG AG A 
AGGGTAAAGCGAATGCGGCTGCCTGGCTGAAAGGTAACCCGGAAACCGCG A AAGAGATCG AG A 
AGGGTAAAGCG A ATGCG AC TGCCTGG CTG AAAG AT AACCCGGAA ACCGCGA AAGAGATCG AG A 
AGGGTAAAGCG AATGCGACTGCCTGGCTG AAAG ATAACCCGGAA ACCGCG AAAG AG ATCGAG A 



1260 
AGAAAGT 12 58 
AGAAAGT 
AGAAAGT 
AGAAAGT 
AGAAAGT 
AGAAAGT 
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New Hlnshall 
New clone 2 
New Clone 4 
New. clone 5 
New Clone 6 
complete 13. 



ACGTGAGTTGCTGCTGAGCAACCCGAACTCAACGCCGGATTTCTCTGTAGATGATAGCGAAGGCGTAGCA 
1270 1280 • 1290 " 1300 1310 i J, ft I 

JifgSg??^ Is 7 



New Hinshal 1 
New Clone 2 
New Clone 4 
New Clone t 
New Clcne 6 
complete 13 



GAAACTAACGAAGAT TTTTAATCGTCTTGTTTGATAC AC AAGGGTCCCATCTGCGGCCCTTTTC CTTTTT 

, -A rt 7oV« T777~ TXT _ a t — r 



1340 



13S0 



1360 



1370 



1380 



g^^A-gi^^^^ it! 



New Hinshal] 
New Clone ; 
New Clone. < 
New Clone t 
New Clone 6 
cotnpl e ce ■ 1.3 . 



TAACTTGTAAGGATATGCCATGACAGAATCAArATCCCGTCX X XXXXXXXyxXXXXXXXXXXXXXXXXXX ■ 

?ttS??g?&8g«?iS8^^ 1 111 



T AAGTTGTAAGGATATGCCATGA 



1343 

1379 



m 



mm 



m 



^^^^^^^^ 



* New Hinsf.a i l 
New .Clcne 2 
New Clone. 4 
New Ci one 5 - 
New Clone 6 

. complete 13* 



W xoc>x«x«x«xw«x*yx«x««X'X««l 

1460 
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. MTGVKMAI DENKQKALAAALGOI EKQFGKGS I MRLGE DRSMDVETI STGSLSLDI ALGAGGLP MGRI VET 
, | ; : I J I i j | 

10 20 30 40 SO 60 70 

orig prot MTGVKMAI DENKQKALAAALGQIEKQFGKGS IMRLGEDRSMDVETI STGSLSLDI ALGAGGLPMGRI VEI 70 

Clone 2 prot MTGVKMAI DENKQKALATALGQI EKQFGKGS IMRLGEDRSMDVETI STGSLSLDI ALGAGGLPMGRI VEI 70 

clone 4 prot MTGVNMAI DENKQKALAAALGOI EKQFGKGS I MRLGEDRSMDVET I STGSLSLDI ALGAGGLPMGRI VEI 70 

Clone S prct MTGVKMAI DENKQKALAAALGOI EKQFGKGS I MRLGEDRSMDVET I STGSLSLDI ALGAGGLPMGRI VEI 70 

clone 6 prot MTGVKMAI DENKQKALAAALGOI EKQFGKGS I MRLG E DRSMDVETI ST GSLSLD I ALGAGGLPMGRI VEI 70 

clone 13 prot MTGVKMAI DENKQKALAAALGOI EKQFGKGS I MRLGE DRSMDVETI STGSLSLDI ALGAGGLPMGRI VEI 70 



YGPESSGKTTLTLQVIAAAOREGKTCAFI DAEHALDPI YARKLGVDI DNLLCSQPDTGEQALE I CDALAR 

80 90 ■ 100 - 110 120. 130 140 

orig prot YGPESSGKTTLTLQVIAAAOREGKTCAFI DAEHALDPI YARKLGVDI DNLLCSQPDTGEQALE I CDALAR 140 

clone 2 prot YGPESSGKTTLTLQVIAAAOREGKTCAFI DA EH AL DP I YARKLGVDI DNLLCSQPDTGEQALE I CDALAR 140 

clone 4 prot YGPESSGKTTLTLQVIAAAOREGKTCAFI DAEHALDPI YARKLGVDIDNLLCSQPDTGEQALEICDALAR 14 0 

Clone 5 prot YGPESSGKTTLTLQVIAAAOREGKTCAFI DAEHALDP I YARKLGVDI DNLLCSQPDTGEQALE I CDALAR 14 0 

clone 6 prot YGPESSGKTTLTLQVIAAAOREGKTCAFI DAEHALDPI YARKLGVDI DNLLCSQPDTGEQALE I CDALAR 14 0 

clone 13 prot YGPES SGKTTLTLQVI AAAQREGKTCAFI DAEHALDPI YARKLGVDI DNLLCSQPDTGEQALE I CDALAR 14 0 



SGAVDVI VVDS VAALTPKAE I EGE I GDSHMGLAARMMSOAMRKLAGNLKpSNTLL I FI NO I RMK IGVMFG 

150 160 170 160 190 200 210 

orig prot SGAVDVI VVDSV A ALT PKA EI EGE IGDSHMGLAARMMSQAMRKLAGNLKQSNTLLI FI NQI RMKI GVMFG 210 

clone 2 prot SGAVDVI VVDSV AALTPKAEI EGEI GDSHMGLAARMMSQAMRKLAGNLKQSNTLLI FINQ I RMK I GVMFG 2 10 

clone 4 prot SGAVDVI VVDS VAALTPKAE I EGE I GDSHMGLAARMMSOAMRKLAGNLKpSNTLL I FI NQ 1 RMKI GVMFG 210 

clone 5 prot SGAVDVI VVDS VAALTPKAE I EGE I GDSHMGLAARMMSQAMRKLAGNLKLSNTLL I FI NQI RMK I GVMFG 210 

clone € prot SGAVDVI VVDS VAALTS KAE I EGE I GDS HMGLAARMMSQAMRKLAGNLKQSNTLL I FI NQ I RMK IGVMFG 2 10 

cione 13 prot SGAVDV I VVDS V AALTPKAEI EGE I GDSHMGLAARHMS QAMRK LACNLKQSNTLL I FI NQ I RMK I GVMFG 210 



NPETTTGGNALK FYASVRLDI RRI GAVKEGENVVGS ETRyKVVKNKTAAFFKOAEFOI LYGEG I NFYGEL 

220 230 240 2S0 260 210 2B0 

orig prot NPETTTGGNALK FYASVRLDI RRI GAVKEGENVVGS ETRVKVVKNKI A APFKQAEFQI LYGEG I NFYGEL 280 

clone 2 prot NPETTTGGNALK FYASVRLDI RRI GAVKEGENVVGS ETRVKVVKNKI A APFKQAEFQVLYGEG I NFYGEL 280 

clone 4 prot NPETTTGGNALK FYASVRLDI RRI GAVKEGENVVGS ETRVKVVKNK I AAPFKQAEFQI LYGEG I NFYGEL 280 

clone b prot NPETTTGGNALK FYASVRLDI RRI GAVKEGENVVGS ETRVKVVKNKI A AP FKQAEFQI LYGEG I NFYGEL 280 

clone 6 proc NPETTTGGNALK FYASVRLDI RRI GAVKEGENVVGS ETRVKVVKNK I AA P FKQAEFQI LYGEG I NFYGEL 280 

clone 13 prot N PETTTGGNALKFYASVRLD I RRI GTVKEGENVVGS ETRVKVVKNKI AAPFKQAEFQI LYDEGI NFYGEL 280 



VDLGVKEKL I EKAGAWYS YKGEKI CQGKANATAWLKDNPETAKEI EKKVR ELLLSNPNSTPDFSVDDSEG 

1 1 1 1 r J r 

290 300 310 320 330 340 350 

orig prot VDLGVKEKLI EKAGAWYSYKGEKIGQGKANATAWLKDNPETAKEI EKKVRELLLSNPNSTPDFSVDDSEG 350 

clone 2 prot VDLGVKEKLI EKAGAWYS YKGEK I GQGKANATAWLKDNPET AKEI EKKVRELLLSN PNST PDFSGDDSEG 350 

clone 1 prot V0LGVKEKLI EKAGAWYS YKGEK I GQGKANATAWLKDN PET AKEI EKKVRELLLSN PNST POFSVDDS EG 350 

cione 5 prot VDLGVKEKLI EKAGAWYS YKGEK I GQGKANAAAWLKGN PET AKEI EKKVRELLLSNPNST PDFSRDDSEG 350 

clone 6 prot VDLGVKEKLI EKAGAWYS YKGEKVGQGKANATAWLKDNPETAKEI EKKVRELLLSNPNST PDFSVDDSEG 350 

cione 13 prot VDMGVKEKLI EKAGAWYS YKGEK AGQGKANATAWLK DN PETAKEI EKKVRELLLSN PNST POFSVDDS EG 350 



VAKTHEDF 

orig prot VAETNEDF 358 

New Clone 2 VAETNEDF 358 

New Clone 4 . VAGTNEDF 358 

New Clone 5 VAETNEDF 358 

New Clone 6 . VAETNEDF 358 

complete 13 VAETNEDF 358 
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